[GitHub] spark issue #16189: [SPARK-18761][CORE] Introduce "task reaper" to oversee t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16189 **[Test build #69914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69914/consoleFull)** for PR 16189 at commit [`7c65a74`](https://github.com/apache/spark/commit/7c65a749f502e40fcd9dcf791429dee274687b2a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" to over...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16189 **[Test build #69912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69912/consoleFull)** for PR 16189 at commit [`99f7b4f`](https://github.com/apache/spark/commit/99f7b4f169c12db3aa2802ffbce7db4e79388d06). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16144: [MINOR][CORE][SQL][DOCS] Typo fixes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16144 **[Test build #69913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69913/consoleFull)** for PR 16144 at commit [`fee4fb6`](https://github.com/apache/spark/commit/fee4fb60cb7a8a5e936a2d1679cbe430b845e6cd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91671465 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,6 +435,57 @@ private[spark] class Executor( } /** + * Supervises the killing / cancellation of a task by sending the interrupted flag, optionally + * sending a Thread.interrupt(), and monitoring the task until it finishes. + */ + private class TaskReaper(taskRunner: TaskRunner, interruptThread: Boolean) extends Runnable { + +private[this] val killPollingFrequencyMs: Long = + conf.getTimeAsMs("spark.task.killPollingFrequency", "10s") + +private[this] val killTimeoutMs: Long = conf.getTimeAsMs("spark.task.killTimeout", "2m") + +private[this] val takeThreadDump: Boolean = + conf.getBoolean("spark.task.threadDumpKilledTasks", true) + +override def run(): Unit = { + val startTimeMs = System.currentTimeMillis() + def elapsedTimeMs = System.currentTimeMillis() - startTimeMs + + while (!taskRunner.isFinished && elapsedTimeMs < killTimeoutMs) { +taskRunner.kill(interruptThread = interruptThread) +taskRunner.synchronized { + Thread.sleep(killPollingFrequencyMs) +} +if (!taskRunner.isFinished) { + logWarning(s"Killed task ${taskRunner.taskId} is still running after $elapsedTimeMs ms") + if (takeThreadDump) { +try { + val threads = Utils.getThreadDump() + threads.find(_.threadName == taskRunner.threadName).foreach { thread => +logWarning(s"Thread dump from task ${taskRunner.taskId}:\n${thread.stackTrace}") + } +} catch { + case NonFatal(e) => +logWarning("Exception thrown while obtaining thread dump: ", e) +} + } +} + } + if (!taskRunner.isFinished && killTimeoutMs > 0 && elapsedTimeMs > killTimeoutMs) { --- End diff -- I thought about this and it seems like there are only two possibilities here: 1. We're running in local mode, in which case we don't actually want to throw an exception to kill the JVM and even if we did throw then it would keep on running because there's not an uncaught exception handler here. 2. We're running in a separate JVM, in which case any exception thrown in this thread and not caught will cause the JVM to exit. The only place in the body of this code that might actually throw unexpected exceptions is the taskThreadDump, which is already in a `try-catch` block to prevent exceptions from bubbling up. Thus the only purpose of a finally block would be to detect whether it was reached via an exception patch and to log a warning to state that task kill progress will no longer be monitored. Basically, I'm not sure what the finally block is buying us in terms of actionable / useful logs and it's only going to add complexity because then we need to be careful to not throw from the finally block in case it was entered via an exception, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16176: [SPARK-18746][SQL] Add newBigDecimalEncoder
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16176#discussion_r91670818 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala --- @@ -74,6 +74,9 @@ abstract class SQLImplicits { /** @since 1.6.0 */ implicit def newStringEncoder: Encoder[String] = Encoders.STRING + /** @since 2.2.0 */ + implicit def newDecimalEncoder: Encoder[java.math.BigDecimal] = Encoders.DECIMAL + --- End diff -- yea we should --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16135 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69903/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16214 @mengxr I think there is no (at least I did not found) concurrency issue when multiple executors on the same machine, since native R has lock to protect this. You can verify it by installing package to a shared directory in a single node in multi-thread way. However, we should recommend to install package to an executor associated directory which means different executor has its own R lib directory even if they are in the same machine. For YARN mode, it works well since each executor has its own R lib directory natively. @felixcheung I updated the examples and add corresponding session in SparkR user guide. I'm not sure whether such an example is appropriate, since it has many dependencies on environment. But I think for an interactive analytical tool, installing packages across the session is not very rare. Still, I'm open to hear your thought, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16135 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16135 **[Test build #69903 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69903/consoleFull)** for PR 16135 at commit [`82cf00e`](https://github.com/apache/spark/commit/82cf00e3522e90d153ed0d7481b838d415a8a383). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16175: [SPARK-17460][SQL]check if statistics.sizeInBytes...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16175#discussion_r91670157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -115,7 +115,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { */ private def canBroadcast(plan: LogicalPlan): Boolean = { plan.statistics.isBroadcastable || -plan.statistics.sizeInBytes <= conf.autoBroadcastJoinThreshold +(plan.statistics.sizeInBytes >= 0 && --- End diff -- ok it's fine to add `>= 0` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16231: [MINOR][SPARKR] Fix SparkR regex in copy command
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16231 This is from the error log in https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.1-package/8/console @felixcheung This is too many errors for my own good ! I'm going to let this sit overnight and take a close look tomm before merging, testing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16230: [SPARK-13747][Core]Fix potential ThreadLocal leaks in RP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16230 **[Test build #69910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69910/consoleFull)** for PR 16230 at commit [`68ae0ed`](https://github.com/apache/spark/commit/68ae0ed3266787944952c01bdd4f1eb7cbfc42a0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16231: [MINOR][SPARKR] Fix SparkR regex in copy command
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16231 **[Test build #69909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69909/consoleFull)** for PR 16231 at commit [`e86889d`](https://github.com/apache/spark/commit/e86889d7b729346b1a9bcb67529b3c17e1d1fb26). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation for joi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16228 **[Test build #69911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69911/consoleFull)** for PR 16228 at commit [`64603b5`](https://github.com/apache/spark/commit/64603b589bd1b971d2370d848f1dd19f11b52928). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16214 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16214 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69908/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16231: [MINOR][SPARKR] Fix SparkR regex in copy command
GitHub user shivaram opened a pull request: https://github.com/apache/spark/pull/16231 [MINOR][SPARKR] Fix SparkR regex in copy command Fix SparkR package copy regex. The existing code leads to ``` Copying release tarballs to /home//public_html/spark-nightly/spark-branch-2.1-bin/spark-2.1.1-SNAPSHOT-2016_12_08_22_38-e8f351f-bin mput: SparkR-*: no files found ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/shivaram/spark-1 typo-sparkr-build Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16231.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16231 commit e86889d7b729346b1a9bcb67529b3c17e1d1fb26 Author: Shivaram VenkataramanDate: 2016-12-09T07:29:43Z Fix SparkR regex in copy command --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16230: [SPARK-13747][Core]Fix potential ThreadLocal leaks in RP...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16230 cc @yhuai since you helped review #15520 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16214 **[Test build #69908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69908/consoleFull)** for PR 16214 at commit [`ce18e2e`](https://github.com/apache/spark/commit/ce18e2eb181ddf6079640049e2ed2d18bb8ae03c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16230: [SPARK-13747][Core]Fix potential ThreadLocal leak...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/16230 [SPARK-13747][Core]Fix potential ThreadLocal leaks in RPC when using ForkJoinPool ## What changes were proposed in this pull request? Some places in SQL may call `RpcEndpointRef.askWithRetry` (e.g., ParquetFileFormat.buildReader -> SparkContext.broadcast -> ... -> BlockManagerMaster.updateBlockInfo -> RpcEndpointRef.askWithRetry), which will finally call `Await.result`. It may cause `java.lang.IllegalArgumentException: spark.sql.execution.id is already set` when running in Scala ForkJoinPool. This PR includes the following changes to fix this issue: - Removed `ThreadUtils.awaitResult` - Renamed `ThreadUtils. awaitResultInForkJoinSafely` to `ThreadUtils.awaitResult` - Replaced `Await.result` in RpcTimeout with `ThreadUtils.awaitResult`. ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark fix-SPARK-13747 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16230.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16230 commit 68ae0ed3266787944952c01bdd4f1eb7cbfc42a0 Author: Shixiong ZhuDate: 2016-12-09T07:22:46Z Fix potential ThreadLocal leaks when using ForkJoinPool --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69904/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16220 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16220 **[Test build #69904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69904/consoleFull)** for PR 16220 at commit [`7b2bf04`](https://github.com/apache/spark/commit/7b2bf04bb9faafd0c2202d7d0309df9f397077c5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r91669101 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +489,26 @@ class HiveUDFSuite extends QueryTest with TestHiveSingleton with SQLTestUtils { assert(count4 == 1) sql("DROP TABLE parquet_tmp") } + + test("Hive Stateful UDF") { +withUserDefinedFunction("statefulUDF" -> true, "statelessUDF" -> true) { + sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS '${classOf[StatefulUDF].getName}'") + sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS '${classOf[StatelessUDF].getName}'") + val testData = spark.range(10).repartition(1) + + // Expected Max(s) is 10 as statefulUDF returns the sequence number starting from 1. + checkAnswer(testData.selectExpr("statefulUDF() as s").agg(max($"s")), Row(10)) + + // Expected Max(s) is 5 as statefulUDF returns the sequence number starting from 1, + // and the data is evenly distributed into 2 partitions. --- End diff -- oh you are right, I misread the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16228#discussion_r91668803 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/estimation/JoinEstimation.scala --- @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.plans.logical.estimation + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Expression} +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{ColumnStat, Join, Statistics} +import org.apache.spark.sql.types.DataType + + +object JoinEstimation { + import EstimationUtils._ + + // scalastyle:off + /** + * Estimate output size and number of rows after a join operator, and propogate updated column + * statsitics. + * The number of rows of A inner join B on A.k1 = B.k1 is estimated by this basic formula: + * T(A IJ B) = T(A) * T(B) / max(V(A.k1), V(B.k1)), where V is the number of distinct values of + * that column. The underlying assumption for this formula is: each value of the smaller domain + * is included in the larger domain. + * Generally, inner join with multiple join keys can also be estimated based on the above + * formula: + * T(A IJ B) = T(A) * T(B) / (max(V(A.k1), V(B.k1)) * max(V(A.k2), V(B.k2)) * ... * max(V(A.kn), V(B.kn))) + * However, the denominator can become very large and excessively reduce the result, so we use a + * conservative strategy to take only the largest max(V(A.ki), V(B.ki)) as the denominator. --- End diff -- Here, Hive uses an exponential decay to compute the denominator when number of join keys > number of join tables, i.e. ndv1 * ndv2^(1/2) * ndv3^(1/4)... I just use a more conservative strategy by max(ndv1, ndv2, ...). I don't know which one is better. Do you know any theoretical or empirical support of hive's strategy? @rxin @srinathshankar --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation for joi...
Github user Tagar commented on the issue: https://github.com/apache/spark/pull/16228 2. That is great. Would it be easier to use FK, when it is available (see HMS has FKs since Hive 2.1: https://issues.apache.org/jira/browse/HIVE-13076), and if FK between columns is not defined, then use stats. Also, do I understand correctly, that the assumption is if two tables being joined by columns with the same name, join columns have the same stats / set of values? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16214: [SPARK-18325][SPARKR] Add example for using nativ...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16214#discussion_r91667478 --- Diff: examples/src/main/r/native-r-package.R --- @@ -0,0 +1,80 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This example illustrates how to use third-party R packages in your task +# which is distributed by Spark. We support two scenarios: +# - Install packages from CRAN to executors directly. +# - Install packages from local file system to executors. +# +# To run this example use +# ./bin/spark-submit examples/src/main/r/native-r-package.R + +# Load SparkR library into your R session +library(SparkR) + +# Initialize SparkSession +sparkR.session(appName = "SparkR-native-r-package-example") + +# Get the location of the default library +libDir <- .libPaths()[1] --- End diff -- Good suggestion! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16214 **[Test build #69908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69908/consoleFull)** for PR 16214 at commit [`ce18e2e`](https://github.com/apache/spark/commit/ce18e2eb181ddf6079640049e2ed2d18bb8ae03c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16215: [Streaming] Update PairDStreamFunctions.scala
Github user Jimmy-Newtron commented on the issue: https://github.com/apache/spark/pull/16215 I think that this new API methods is going to bring consistency between PairRDDFunctions and PairDStreamFunctions by providing the same API to access the same data. I see the use case of a streaming application that is consuming a Kafka topic to retrieve the values as the simplest example of usage of this new API. I accept the PR rejection, even if the new API is matching only positive aspects of the Code Review Criteria. Positives: Adds functionality needed by a large number of users Simple, targeted Negatives, Risks: NONE Best regards --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16227: Copy pyspark and SparkR packages to latest releas...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16227 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation for joi...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16228 I still leave two issues undecided: 1. Where can we turn on/off cbo estimation? I think we need to have such switch, because otherwise we will use statistics as long as they are in the metastore, but they can become stale. 2. Currently we use column name as the key for column statistics, which is problematic because if the output of join have columns from different tables with the same column name, they can't be distinguished. Can we use a combination string like table name + column name? Will it have problem in case of table alias? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation for joi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16228 **[Test build #69907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69907/consoleFull)** for PR 16228 at commit [`f0bbb43`](https://github.com/apache/spark/commit/f0bbb43f7b27c59a3181ce428ed2bb0a7c1fc89d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16227: Copy pyspark and SparkR packages to latest release dir t...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16227 LGTM. Merging this to master, branch-2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16227: Copy pyspark and SparkR packages to latest release dir t...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16227 Ah Thanks. Didn't see this - I'll close #16229 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16229: [MINOR][SPARKR][PYSPARK] Copy PySpark, SparkR archives t...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16229 Closing in favor of #16227 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16229: [MINOR][SPARKR][PYSPARK] Copy PySpark, SparkR arc...
Github user shivaram closed the pull request at: https://github.com/apache/spark/pull/16229 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16226: Copy the SparkR source package with LFTP
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16226#discussion_r91666064 --- Diff: dev/create-release/release-build.sh --- @@ -258,6 +258,7 @@ if [[ "$1" == "package" ]]; then LFTP mkdir -p $dest_dir LFTP mput -O $dest_dir 'spark-*' LFTP mput -O $dest_dir 'pyspark-*' + LFTP mput -O $dest_dir 'SparkR-*' --- End diff -- opened https://github.com/apache/spark/pull/16229 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16229: [MINOR][SPARKR][PYSPARK] Copy PySpark, SparkR archives t...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16229 cc @felixcheung @holdenk @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation for joi...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16228 cc @rxin @srinathshankar @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16229: [MINOR][SPARKR][PYSPARK] Copy PySpark, SparkR arc...
GitHub user shivaram opened a pull request: https://github.com/apache/spark/pull/16229 [MINOR][SPARKR][PYSPARK] Copy PySpark, SparkR archives to latest/ This change copies the pip / CRAN compatible source archives to latest/ during release build You can merge this pull request into a Git repository by running: $ git pull https://github.com/shivaram/spark-1 copy-pyspark-sparkr-latest-build Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16229.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16229 commit 4f89fa7b2ac1e80e7ca8198b505989c3722fe941 Author: Shivaram VenkataramanDate: 2016-12-09T06:41:32Z Copy PySpark, SparkR archives to latest/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation ...
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/16228 [WIP] [SPARK-17076] [SQL] Cardinality estimation for join based on basic column statistics ## What changes were proposed in this pull request? This is a WIP PR. Currently we support estimation for inner equal join, will support outer join, left semi join afterwards. ## How was this patch tested? Just a simple test case. More tests need to be added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark joinEstimate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16228.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16228 commit f0bbb43f7b27c59a3181ce428ed2bb0a7c1fc89d Author: wangzhenhuaDate: 2016-12-09T06:34:41Z estimation for inner join --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16227: Copy pyspark and SparkR packages to latest release dir t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16227 **[Test build #69906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69906/consoleFull)** for PR 16227 at commit [`afab186`](https://github.com/apache/spark/commit/afab1868413f11f543cbc3869294799e9ff9f674). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16227: Copy pyspark and SparkR packages to latest release dir t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16227 @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16227: Copy pyspark and SparkR packages to latest releas...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/16227 Copy pyspark and SparkR packages to latest release dir too ## What changes were proposed in this pull request? Copy pyspark and SparkR packages to latest release dir @shivaram You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark pyrftp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16227.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16227 commit afab1868413f11f543cbc3869294799e9ff9f674 Author: Felix CheungDate: 2016-12-09T06:28:21Z ftp to latest too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16226: Copy the SparkR source package with LFTP
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16226#discussion_r91664822 --- Diff: dev/create-release/release-build.sh --- @@ -258,6 +258,7 @@ if [[ "$1" == "package" ]]; then LFTP mkdir -p $dest_dir LFTP mput -O $dest_dir 'spark-*' LFTP mput -O $dest_dir 'pyspark-*' + LFTP mput -O $dest_dir 'SparkR-*' --- End diff -- I checked http://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest/ - pyspark-* is missing in all latest dir --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16226: Copy the SparkR source package with LFTP
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16226#discussion_r91664625 --- Diff: dev/create-release/release-build.sh --- @@ -258,6 +258,7 @@ if [[ "$1" == "package" ]]; then LFTP mkdir -p $dest_dir LFTP mput -O $dest_dir 'spark-*' LFTP mput -O $dest_dir 'pyspark-*' + LFTP mput -O $dest_dir 'SparkR-*' --- End diff -- hmmm.. shouldn't pyspark-* and SparkR-* in the latest too (see L253) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16226: Copy the SparkR source package with LFTP
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16226 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16000: [SPARK-18537][Web UI]Add a REST api to spark streaming
Github user saturday-shi commented on the issue: https://github.com/apache/spark/pull/16000 @vanzin Hello, I'm a collaborator of this PR. Actually I am interested in your plan, but we don't want to make the changes here because that is not the purpose of this PR. I think I can open a new PR and implement the changes there. @uncleGen I reviewed your code and found that there're lot of things to improve. I prefer to use the existing ones in this PR to avoid duplicate works. I will open a new PR later, but if you already have a plan please let me know. Maybe I can work on with you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16221: [SPARKR][PYSPARK] Fix R source package name to match Spa...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16221 I tested a source package with a different version in filename vs DESCRIPTION and it seems to be working fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16223: [SPARK-18697][BUILD] Upgrade sbt plugins
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16223 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16223: [SPARK-18697][BUILD] Upgrade sbt plugins
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16223 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16226: Copy the SparkR source package with LFTP
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16226 Merging to master, branch-2.1 - and testing again on nightly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16150 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16150: [SPARK-18349][SparkR]:Update R API documentation on ml m...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16150 merged to master and branch-2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16226: Copy the SparkR source package with LFTP
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16226 ouch. LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16226: Copy the SparkR source package with LFTP
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16226 **[Test build #69905 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69905/consoleFull)** for PR 16226 at commit [`7afd48c`](https://github.com/apache/spark/commit/7afd48ce0f92af0bbb6a303df94ad55e27b71ded). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16226: Copy the SparkR source package with LFTP
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16226 cc @felixcheung @rxin This should hopefully be the last one for this - FYI the source archive is named `SparkR_$SPARK_VERSION.tar.gz` and we were only copying files named `spark-*` and `pyspark-*` before --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16226: Copy the SparkR source package with LFTP
GitHub user shivaram opened a pull request: https://github.com/apache/spark/pull/16226 Copy the SparkR source package with LFTP This PR adds a line in release-build.sh to copy the SparkR source archive using LFTP You can merge this pull request into a Git repository by running: $ git pull https://github.com/shivaram/spark-1 fix-sparkr-copy-build Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16226.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16226 commit 7afd48ce0f92af0bbb6a303df94ad55e27b71ded Author: Shivaram VenkataramanDate: 2016-12-09T05:43:23Z Copy the SparkR source package with LFTP --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16221: [SPARKR][PYSPARK] Fix R source package name to match Spa...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16221 FYI the pip issue is fixed as you can see in the nightly build at http://people.apache.org/~pwendell/spark-nightly/spark-branch-2.1-bin/spark-2.1.1-SNAPSHOT-2016_12_08_18_31-ef5646b-bin/ -- ``` spark-2.1.1-SNAPSHOT-bin-hadoop2.7.tgz 2016-12-09 02:52187M ``` Further the SparkR build was successful [1] but we are right now missing a line to copy the source archive with FTP - I am sending a PR for that [1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.1-package/7/console --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16225: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16225 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16225: [SPARK-14932][SQL] Allow DataFrame.replace() to r...
GitHub user bravo-zhang opened a pull request: https://github.com/apache/spark/pull/16225 [SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None ## What changes were proposed in this pull request? Allow DataFrame.replace() to replace with None/null values. ## How was this patch tested? Python doctest. Scala and Java local test. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bravo-zhang/spark spark-14932 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16225.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16225 commit 2653750762c12cf5d17b8a2ac2a7ee9f8d55bfec Author: bravo-zhangDate: 2016-12-09T05:22:31Z [SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16216: [SPARK-18774][CORE][SQL] Ignore non-existing file...
Github user zsxwing closed the pull request at: https://github.com/apache/spark/pull/16216 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16220 **[Test build #69904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69904/consoleFull)** for PR 16220 at commit [`7b2bf04`](https://github.com/apache/spark/commit/7b2bf04bb9faafd0c2202d7d0309df9f397077c5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16222: [SPARK-18797][SparkR]:Update spark.logit in sparkr-vigne...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16222 @mengxr 1. Yes, I can add explanation of what "logistic regression" is. 2. "We shouldn't make the vignettes repeat the content in the API doc." Do you mean removing the parameter explanation? I saw other algorithms have such explanations. Any suggestions on replacing them? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r91570259 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +489,26 @@ class HiveUDFSuite extends QueryTest with TestHiveSingleton with SQLTestUtils { assert(count4 == 1) sql("DROP TABLE parquet_tmp") } + + test("Hive Stateful UDF") { +withUserDefinedFunction("statefulUDF" -> true, "statelessUDF" -> true) { + sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS '${classOf[StatefulUDF].getName}'") + sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS '${classOf[StatelessUDF].getName}'") + val testData = spark.range(10).repartition(1) + + // Expected Max(s) is 10 as statefulUDF returns the sequence number starting from 1. + checkAnswer(testData.selectExpr("statefulUDF() as s").agg(max($"s")), Row(10)) + + // Expected Max(s) is 5 as statefulUDF returns the sequence number starting from 1, + // and the data is evenly distributed into 2 partitions. --- End diff -- case logical.Repartition(numPartitions, shuffle, child) => if (shuffle) { ShuffleExchange(RoundRobinPartitioning(numPartitions), planLater(child)) :: Nil } else { execution.CoalesceExec(numPartitions, planLater(child)) :: Nil } case logical.RepartitionByExpression(expressions, child, nPartitions) => exchange.ShuffleExchange(HashPartitioning( expressions, nPartitions.getOrElse(numPartitions)), planLater(child)) :: Nil --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/16135 hi @ericl This commit do the 3 things below, thanks for your check: 1. Delete the unnecessary lock use and simplify the lock operation 2. Add UT test in `PartitionedTablePerfStatsSuite` 3. Add cache hit metrics in `HiveCatalogMetrics` Also change the description of this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16135 **[Test build #69903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69903/consoleFull)** for PR 16135 at commit [`82cf00e`](https://github.com/apache/spark/commit/82cf00e3522e90d153ed0d7481b838d415a8a383). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69900/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16220 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16220 **[Test build #69900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69900/consoleFull)** for PR 16220 at commit [`ce1baf2`](https://github.com/apache/spark/commit/ce1baf2c8934e3b4ce57a981908afcc1b19a209e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16114 @brkyvz Could you also check this pr #16213? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/16210 I find the reason, because we pass some SPARK_SUBMIT_OPTS defined by ourself, so it seem that spark only parse the opts defined by ourself, ignore the ```-Dscala.usejavacp=true```. Since we want user to use the `SPARK_SUBMIT_OPTS`, the best way it to separate the ```-Dscala.usejavacp=true``` from the SPARK_SUBMIT_OPTS, maybe move to SparkSubmitCommandBuilder is good idea as suggestioned by @vanzin. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16221: [SPARKR][PYSPARK] Fix R source package name to match Spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16221 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16221: [SPARKR][PYSPARK] Fix R source package name to match Spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16221 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69897/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16221: [SPARKR][PYSPARK] Fix R source package name to match Spa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16221 **[Test build #69897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69897/consoleFull)** for PR 16221 at commit [`9b97765`](https://github.com/apache/spark/commit/9b977651bdd85659fb6ab9d00e26ee0bb6ddbb52). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16175: [SPARK-17460][SQL]check if statistics.sizeInBytes...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16175#discussion_r91654619 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -115,7 +115,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { */ private def canBroadcast(plan: LogicalPlan): Boolean = { plan.statistics.isBroadcastable || -plan.statistics.sizeInBytes <= conf.autoBroadcastJoinThreshold +(plan.statistics.sizeInBytes >= 0 && --- End diff -- So far, the incorrect statistics should be fine. Even if we add an `assert`, we still can disable it at runtime when it blocking the execution. Thus, either is fine to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16175: [SPARK-17460][SQL]check if statistics.sizeInBytes...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16175#discussion_r91654375 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -115,7 +115,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { */ private def canBroadcast(plan: LogicalPlan): Boolean = { plan.statistics.isBroadcastable || -plan.statistics.sizeInBytes <= conf.autoBroadcastJoinThreshold +(plan.statistics.sizeInBytes >= 0 && --- End diff -- We need to update the value of `-1` to the actual default value `Long.MaxValue` https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L139 Update the code in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L767 to ``` def defaultSizeInBytes: Long = getConf(DEFAULT_SIZE_IN_BYTES) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16219 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69898/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16219 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16219 **[Test build #69898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69898/consoleFull)** for PR 16219 at commit [`7b6538c`](https://github.com/apache/spark/commit/7b6538c2918ef947740d863fea616af38b8d1d6b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16200: [SPARK-18773][core] Make commons-crypto config translati...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16200 @zsxwing want to take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14638 Yes. Right, that is possible, but it's a different use case. This PR supports the followings: - The **existing Hive table** with `TBLPROPERTIES(skip.header.line.count=1)` will show the same result in Hive & Spark. Previously, Spark ignores that option. So, Spark shows incorrect result for that table. - And, vice versa, the **table created by Spark** will show the same result in Hive & Spark. This is not a hidden Hive feature. It was - Reported by `Daniel Haviv` (Oct. 2015) - Asked by `Stephane Maarek` (Apr. 2016) - Asked by `Rahul Jain` (Sep. 2016) Also, there is a StackOverflow question. - http://stackoverflow.com/questions/38566591/spark-sql-hivecontext-dont-ignore-header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69902/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16213 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16213 **[Test build #69902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69902/consoleFull)** for PR 16213 at commit [`e8a4e1d`](https://github.com/apache/spark/commit/e8a4e1d3057d0c2df61f80b61b5de80849b97b90). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16221: [SPARKR][PYSPARK] Fix R source package name to match Spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16221 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69896/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16221: [SPARKR][PYSPARK] Fix R source package name to match Spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16221 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16221: [SPARKR][PYSPARK] Fix R source package name to match Spa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16221 **[Test build #69896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69896/consoleFull)** for PR 16221 at commit [`74d779b`](https://github.com/apache/spark/commit/74d779b8bdee8c6ad4aba69c051a3f9c87fecd3c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/16210 @jodersky Yes. I try different ways, here is the result: ``` SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true -usejavacp" ``` and ``` SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true -Dusejavacp" ``` will output ``` Exception in thread "main" java.lang.AssertionError: assertion failed: null at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) ``` ``` SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -usejavacp" ``` will output: ``` Unrecognized option: -usejavacp Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16204 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16204 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69899/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16204 **[Test build #69899 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69899/consoleFull)** for PR 16204 at commit [`ceeacde`](https://github.com/apache/spark/commit/ceeacdec05e111fbbe72dd534baacb72fbc0d454). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16224: [SPARK-18792] [R] mention spark.logit in vignettes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16224 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16224: [SPARK-18792] [R] mention spark.logit in vignettes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69901/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16224: [SPARK-18792] [R] mention spark.logit in vignettes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16224 **[Test build #69901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69901/consoleFull)** for PR 16224 at commit [`b037cec`](https://github.com/apache/spark/commit/b037cecb1b20cf1666e9af42a8bfd5e0d0bac849). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16223: [SPARK-18697][BUILD] Upgrade sbt plugins
Github user weiqingy commented on the issue: https://github.com/apache/spark/pull/16223 Thanks @srowen for the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16223: [SPARK-18697][BUILD] Upgrade sbt plugins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16223 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69894/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16223: [SPARK-18697][BUILD] Upgrade sbt plugins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16223 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16223: [SPARK-18697][BUILD] Upgrade sbt plugins
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16223 **[Test build #69894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69894/consoleFull)** for PR 16223 at commit [`062f619`](https://github.com/apache/spark/commit/062f6192ceed482d49c53249d386bb2e3afec11d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16221: [SPARKR][PYSPARK] Fix R source package name to ma...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16221 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org