[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66801/consoleFull)** for PR 9766 at commit [`d481821`](https://github.com/apache/spark/commit/d4818217dc6e29a72a4e470dbe08cda197933162). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15360 **[Test build #66799 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66799/consoleFull)** for PR 15360 at commit [`1e64163`](https://github.com/apache/spark/commit/1e641633cbd38a4a990a1cebafeff7be276a0fec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66800/consoleFull)** for PR 15148 at commit [`1b63173`](https://github.com/apache/spark/commit/1b6317396629b9f290a279dd735923c0fc8efd89). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r82944965 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"longer running") } - if (disallowStageRetryForTest) { -abortStage(failedStage, "Fetch failure will not retry stage due to testing config", - None) - } else if (failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId)) { -abortStage(failedStage, s"$failedStage (${failedStage.name}) " + - s"has failed the maximum allowable number of " + - s"times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. " + - s"Most recent failure reason: ${failureMessage}", None) - } else { -if (failedStages.isEmpty) { - // Don't schedule an event to resubmit failed stages if failed isn't empty, because - // in that case the event will already have been scheduled. - // TODO: Cancel running tasks in the stage - logInfo(s"Resubmitting $mapStage (${mapStage.name}) and " + -s"$failedStage (${failedStage.name}) due to fetch failure") - messageScheduler.schedule(new Runnable { -override def run(): Unit = eventProcessLoop.post(ResubmitFailedStages) - }, DAGScheduler.RESUBMIT_TIMEOUT, TimeUnit.MILLISECONDS) + val shouldAbortStage = +failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) || +disallowStageRetryForTest + + if (shouldAbortStage) { +val abortMessage = if (disallowStageRetryForTest) { + "Fetch failure will not retry stage due to testing config" +} else { + s"""$failedStage (${failedStage.name}) + |has failed the maximum allowable number of + |times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. + |Most recent failure reason: $failureMessage""".stripMargin.replaceAll("\n", " ") } +abortStage(failedStage, abortMessage, None) + } else { // update failedStages and make sure a ResubmitFailedStages event is enqueued +// TODO: Cancel running tasks in the failed stage -- cf. SPARK-17064 +val noResubmitEnqueued = !failedStages.contains(failedStage) failedStages += failedStage failedStages += mapStage +if (noResubmitEnqueued) { + // We expect one executor failure to trigger many FetchFailures in rapid succession, + // but all of those task failures can typically be handled by a single resubmission of + // the failed stage. We avoid flooding the scheduler's event queue with resubmit + // messages by checking whether a resubmit is already in the event queue for the + // failed stage. If there is already a resubmit enqueued for a different failed + // stage, that event would also be sufficient to handle the current failed stage, but + // producing a resubmit for each failed stage makes debugging and logging a little + // simpler while not producing an overwhelming number of scheduler events. + logInfo( +s"Resubmitting $mapStage (${mapStage.name}) and " + +s"$failedStage (${failedStage.name}) due to fetch failure" + ) + messageScheduler.schedule( --- End diff -- Ah, sorry for ascribing the prior comment to your preferences. That comment actually did make sense a long time ago when the resubmitting of stages really was done periodically by an Akka scheduled event that fired every something seconds. I'm pretty sure the RESUBMIT_TIMEOUT stuff is also legacy code that doesn't make sense and isn't necessary any more. So, do you want to do the follow-up PR to get rid of it, or shall I? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow us...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15434 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should a...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15434#discussion_r82944802 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -459,11 +459,20 @@ class SessionCatalog( * If a database is specified in `oldName`, this will rename the table in that database. * If no database is specified, this will first attempt to rename a temporary table with * the same name, then, if that does not exist, rename the table in the current database. + * + * This assumes the database specified in `newName` matches the one in `oldName`. */ - def renameTable(oldName: TableIdentifier, newName: String): Unit = synchronized { + def renameTable(oldName: TableIdentifier, newName: TableIdentifier): Unit = synchronized { val db = formatDatabaseName(oldName.database.getOrElse(currentDb)) +newName.database.map(formatDatabaseName).foreach { newDb => --- End diff -- uh, I see. If this is by design, I do not have more questions. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13065: [SPARK-15214][SQL] Code-generation for Generate
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13065#discussion_r82944265 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala --- @@ -99,5 +102,182 @@ case class GenerateExec( } } } -} + override def inputRDDs(): Seq[RDD[InternalRow]] = { +child.asInstanceOf[CodegenSupport].inputRDDs() + } + + protected override def doProduce(ctx: CodegenContext): String = { +// We need to add some code here for terminating generators. +child.asInstanceOf[CodegenSupport].produce(ctx, this) + } + + override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = { +ctx.currentVars = input +ctx.copyResult = true + +// Add input rows to the values when we are joining +val values = if (join) { + input +} else { + Seq.empty +} + +// Generate the driving expression. +val data = boundGenerator.genCode(ctx) + +boundGenerator match { + case e: CollectionGenerator => codeGenCollection(ctx, e, values, data, row) + case g => codeGenTraversableOnce(ctx, g, values, data, row) +} + } + + /** + * Generate code for [[CollectionGenerator]] expressions. + */ + private def codeGenCollection( + ctx: CodegenContext, + e: CollectionGenerator, + input: Seq[ExprCode], + data: ExprCode, + row: ExprCode): String = { + +// Generate looping variables. +val index = ctx.freshName("index") + +// Add a check if the generate outer flag is true. +val checks = optionalCode(outer, data.isNull) + +// Add position +val position = if (e.position) { + Seq(ExprCode("", "false", index)) +} else { + Seq.empty +} + +// Generate code for either ArrayData or MapData +val (initMapData, updateRowData, values) = e.collectionSchema match { + case ArrayType(st: StructType, nullable) if e.inline => +val row = codeGenAccessor(ctx, data.value, "col", index, st, nullable, checks) +val fieldChecks = checks ++ optionalCode(nullable, row.isNull) +val columns = st.fields.toSeq.zipWithIndex.map { case (f, i) => + codeGenAccessor(ctx, row.value, f.name, i.toString, f.dataType, f.nullable, fieldChecks) +} +("", row.code, columns) + + case ArrayType(dataType, nullable) => +("", "", Seq(codeGenAccessor(ctx, data.value, "col", index, dataType, nullable, checks))) + + case MapType(keyType, valueType, valueContainsNull) => +// Materialize the key and the value arrays before we enter the loop. +val keyArray = ctx.freshName("keyArray") +val valueArray = ctx.freshName("valueArray") +val initArrayData = + s""" + |ArrayData $keyArray = ${data.isNull} ? null : ${data.value}.keyArray(); + |ArrayData $valueArray = ${data.isNull} ? null : ${data.value}.valueArray(); + """.stripMargin +val values = Seq( + codeGenAccessor(ctx, keyArray, "key", index, keyType, nullable = false, checks), + codeGenAccessor(ctx, valueArray, "value", index, valueType, valueContainsNull, checks)) +(initArrayData, "", values) +} + +// In case of outer we need to make sure the loop is executed at-least once when the array/map +// contains no input. We do this by setting the looping index to -1 if there is no input, +// evaluation of the array is prevented by a check in the accessor code. +val numElements = ctx.freshName("numElements") +val init = if (outer) s"$numElements == 0 ? -1 : 0" else "0" +val numOutput = metricTerm(ctx, "numOutputRows") +s""" + |${data.code} + |$initMapData + |int $numElements = ${data.isNull} ? 0 : ${data.value}.numElements(); + |for (int $index = $init; $index < $numElements; $index++) { + | $numOutput.add(1); + | $updateRowData + | ${consume(ctx, input ++ position ++ values)} + |} + """.stripMargin + } + + /** + * Generate code for a regular [[TraversableOnce]] returning [[Generator]]. + */ + private def codeGenTraversableOnce( + ctx: CodegenContext, + e: Expression, + input: Seq[ExprCode], + data: ExprCode, + row: ExprCode): String = { + +// Generate looping variables. +val iterator = ctx.freshName("iterator") +val hasNext = ctx.freshName("hasNext") +val current = ctx.freshName("row") + +// Add a check
[GitHub] spark issue #15230: [SPARK-17657] [SQL] Disallow Users to Change Table Type
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15230 **[Test build #66798 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66798/consoleFull)** for PR 15230 at commit [`e19536c`](https://github.com/apache/spark/commit/e19536c3c645b70f6cf1df747a7798188acf2935). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15433: [SPARK-17822] Use weak reference in JVMObjectTracker.obj...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15433 **[Test build #66797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66797/consoleFull)** for PR 15433 at commit [`7d50d84`](https://github.com/apache/spark/commit/7d50d84f90fcda9e5dec79c9be834870c83443c4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15230: [SPARK-17657] [SQL] Disallow Users to Change Table Type
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15230 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r82943152 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2474,25 +2478,42 @@ private[spark] class CallerContext( val context = "SPARK_" + from + appIdStr + appAttemptIdStr + jobIdStr + stageIdStr + stageAttemptIdStr + taskIdStr + taskAttemptNumberStr + lazy val conf = new Configuration + /** * Set up the caller context [[context]] by invoking Hadoop CallerContext API of * [[org.apache.hadoop.ipc.CallerContext]], which was added in hadoop 2.8. */ def setCurrentContext(): Boolean = { -var succeed = false -try { - // scalastyle:off classforname - val callerContext = Class.forName("org.apache.hadoop.ipc.CallerContext") - val Builder = Class.forName("org.apache.hadoop.ipc.CallerContext$Builder") - // scalastyle:on classforname - val builderInst = Builder.getConstructor(classOf[String]).newInstance(context) - val hdfsContext = Builder.getMethod("build").invoke(builderInst) - callerContext.getMethod("setCurrent", callerContext).invoke(null, hdfsContext) - succeed = true -} catch { - case NonFatal(e) => logInfo("Fail to set Spark caller context", e) +if (!CallerContext.callerContextSupported) { + false +} else { + if (!conf.getBoolean("hadoop.caller.context.enabled", false)) { +logInfo("Hadoop caller context is not enabled") +CallerContext.callerContextSupported = false +false + } else { +try { +// scalastyle:off classforname --- End diff -- Nit: indent is not correct, use 2 ws. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66796/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66796/consoleFull)** for PR 9766 at commit [`e9832f6`](https://github.com/apache/spark/commit/e9832f6c3dbbf9649333af5ab9a0a0fd0954c237). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8318: [SPARK-1267][PYSPARK] Adds pip installer for pyspark
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/8318 I'd be happy to take it from where @jhlch is at - I've got some bandwidth available to work on additional PySpark stuff and it seems like the interest on the committer side is here now so I'd love to help make this happen :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15448: [SPARK-17108][SQL]: Fix BIGINT and INT comparison failur...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/15448 Also add unit tests please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15448: [SPARK-17108][SQL]: Fix BIGINT and INT comparison failur...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15448 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15448: [SPARK-17108][SQL]: Fix BIGINT and INT comparison failur...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/15448 Could you try to fix this by adding implicit casting to the `GetMapValue` (make it extend `ImplicitCastInputTypes` instead of `ExpectsInputTypes`)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r82942829 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2474,25 +2478,42 @@ private[spark] class CallerContext( val context = "SPARK_" + from + appIdStr + appAttemptIdStr + jobIdStr + stageIdStr + stageAttemptIdStr + taskIdStr + taskAttemptNumberStr + lazy val conf = new Configuration --- End diff -- Please use `SparkHadoopUtils#conf`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8318: [SPARK-1267][PYSPARK] Adds pip installer for pyspark
Github user jhlch commented on the issue: https://github.com/apache/spark/pull/8318 I've got [a branch that has a solid first pass at making pyspark pip installable.](https://github.com/apache/spark/compare/master...jhlch:pipinstall) A few questions are: * How does this integrate with the typical build? Once the jar is built it needs to be put in a location pointed to by setup.py and MANIFEST.in. * What version requirements are there for numpy and pandas? I'm not confident that the one I list are correct or as specific as they could be. * Setup automated testing: * run-tests and run-tests.py should use environments where pyspark has been pip installed and remove the 'find jars' etc thing it currently does. * testpypi exists and could be useful in CI to make sure packaging and distribution never break. CI python envs could be initialized using `pip install --extra-index-url https://testpypi.python.org/pypi pyspark` I've got too much on my plate to see this to the finish line in the next few months, but I do want to see this happen. Is someone else willing to take it from here? If not, I'll come back to it in Dec/Jan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66796/consoleFull)** for PR 9766 at commit [`e9832f6`](https://github.com/apache/spark/commit/e9832f6c3dbbf9649333af5ab9a0a0fd0954c237). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15448: [SPARK-17108][SQL]: Fix BIGINT and INT comparison...
GitHub user weiqingy opened a pull request: https://github.com/apache/spark/pull/15448 [SPARK-17108][SQL]: Fix BIGINT and INT comparison failure in spark sql ## What changes were proposed in this pull request? Add a function to check if two integers are compatible when invoking `acceptsType()` in `DataType`. ## How was this patch tested? Manually. E.g. ``` spark.sql("create table t3(a map>)") spark.sql("select * from t3 where a[1] is not null") ``` Before: ``` cannot resolve 't.`a`[1]' due to data type mismatch: argument 2 requires bigint type, however, '1' is of int type.; line 1 pos 22 org.apache.spark.sql.AnalysisException: cannot resolve 't.`a`[1]' due to data type mismatch: argument 2 requires bigint type, however, '1' is of int type.; line 1 pos 22 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:82) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:307) ``` After: Passed the sql query. No error above. You can merge this pull request into a Git repository by running: $ git pull https://github.com/weiqingy/spark SPARK_17108 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15448.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15448 commit ec3d55296abc9f355a0f0db0f40e04abb4b58d94 Author: Weiqing Yang Date: 2016-10-12T06:14:48Z [SPARK-17108][SQL]: Fix BIGINT and INT comparison failure in spark sql --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15377#discussion_r82942438 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2432,6 +2432,10 @@ private[spark] object Utils extends Logging { } } +private[util] object CallerContext { + var callerContextSupported: Boolean = true --- End diff -- What is the usage of this flag? I don't see any other place use it, all just setters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66795/consoleFull)** for PR 15307 at commit [`4c08d56`](https://github.com/apache/spark/commit/4c08d569f7817e222550ef7578c6e01f90bc4ee0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @felixcheung and @wangmiao1981 thanks! This is good point. I will try testing it on different version of R. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/15421#discussion_r82940884 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -125,15 +125,34 @@ private[spark] object SerDe { } def readDate(in: DataInputStream): Date = { -Date.valueOf(readString(in)) +try { + val inStr = readString(in) + if (inStr == "NA") { +null + } else { +Date.valueOf(inStr) + } +} catch { + // On windows we get NegativeArraySizeException for NAs in R --- End diff -- No. I will revert this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15447: [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint...
Github user apivovarov commented on the issue: https://github.com/apache/spark/pull/15447 Related PRs https://github.com/apache/spark/pull/15396 https://github.com/apache/spark/pull/12576 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15447: [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15447 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15375 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15375 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66792/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15447: [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD che...
GitHub user apivovarov opened a pull request: https://github.com/apache/spark/pull/15447 [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint results Clas⦠EdgeRDD/VertexRDD wraps partitionsRDD e.g. `EdgeRDDImpl.checkpoint()` calls `partitionsRDD.checkpoint()` EdgeRDD/VertexRDD `isCheckpointed()` method should be implemented the same way - it should call `partitionsRDD.isCheckpointed` You can merge this pull request into a Git repository by running: $ git pull https://github.com/apivovarov/spark 14804 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15447.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15447 commit b123b68589d59d65db6210f1792a48d7f94e09bb Author: Alexander Pivovarov Date: 2016-10-12T05:48:37Z [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint results ClassCastException --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15445 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66789/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15445 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15375 **[Test build #66792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66792/consoleFull)** for PR 15375 at commit [`836e874`](https://github.com/apache/spark/commit/836e8745c346c59f78958e10aec1c6f9537242b9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15445 **[Test build #66789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66789/consoleFull)** for PR 15445 at commit [`be6d153`](https://github.com/apache/spark/commit/be6d1537e9bbd2cc2484e4d8da9d901b16725c97). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66794/consoleFull)** for PR 9766 at commit [`45a9b7a`](https://github.com/apache/spark/commit/45a9b7af6afbb2ab1287cc41fafbaa1de823eafa). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66794/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66794/consoleFull)** for PR 9766 at commit [`45a9b7a`](https://github.com/apache/spark/commit/45a9b7af6afbb2ab1287cc41fafbaa1de823eafa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15230: [SPARK-17657] [SQL] Disallow Users to Change Tabl...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15230#discussion_r82940270 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -225,6 +225,11 @@ case class AlterTableSetPropertiesCommand( val catalog = sparkSession.sessionState.catalog val table = catalog.getTableMetadata(tableName) DDLUtils.verifyAlterTableType(catalog, table, isView) +// Not allowed to switch the table type. +if (properties.contains("EXTERNAL")) { --- End diff -- This is officially documented in the Hive document, as shown in the [link](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL): `TBLPROPERTIES ("EXTERNAL"="TRUE") in release 0.6.0+ (HIVE-1329) â Change a managed table to an external table and vice versa for "FALSE".` This is the only property users are not allowed to change. The other Hive-specific properties are still allowed to change, because Hive also allows it. For the our Spark-reserved properties, users are not allowed to change. See the function call `verifyTableProperties` in `[alterTable](https://github.com/apache/spark/blob/b9a147181d5e38d9abed0c7215f4c5cb695f579c/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L393)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15173: [SPARK-15698][SQL][Streaming][Follw-up]Fix FileStream so...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/15173 @zsxwing Why was not this merge to 2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15439: [SPARK-17880][DOC] The url linking to `Accumulato...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15439 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15427 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66790/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15427 **[Test build #66790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66790/consoleFull)** for PR 15427 at commit [`81339dc`](https://github.com/apache/spark/commit/81339dc429104633ee28cf078f643b5050564557). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15439 Thanks - merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15440: Fix hadoop.version in building-spark.md
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15440 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15440: Fix hadoop.version in building-spark.md
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15440 Thanks - merging in master/branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15434#discussion_r82938529 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -459,11 +459,20 @@ class SessionCatalog( * If a database is specified in `oldName`, this will rename the table in that database. * If no database is specified, this will first attempt to rename a temporary table with * the same name, then, if that does not exist, rename the table in the current database. + * + * This assumes the database specified in `newName` matches the one in `oldName`. */ - def renameTable(oldName: TableIdentifier, newName: String): Unit = synchronized { + def renameTable(oldName: TableIdentifier, newName: TableIdentifier): Unit = synchronized { val db = formatDatabaseName(oldName.database.getOrElse(currentDb)) +newName.database.map(formatDatabaseName).foreach { newDb => --- End diff -- see PR description, we should use the database of source table, so that users can just write `db.tbl1 RENAME TO tbl2`. This is different from Hive, as we don't support move table from one database to another. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82938410 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { // Returns true if the plan is supposed to be sorted. def isSorted(plan: LogicalPlan): Boolean = plan match { case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct => false + case _: ShowColumnsCommand => true --- End diff -- @cloud-fan @viirya Thanks :-) I will change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow us...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15434 Just FYI. Hive allows the following changes: ```SQL ALTER TABLE db1.tbl RENAME TO db2.tbl2 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15406: [Spark-17745][ml][PySpark] update NB python api - add we...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15406 We should add weights to the doctests to demonstrate them and make sure they're working. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82937473 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { // Returns true if the plan is supposed to be sorted. def isSorted(plan: LogicalPlan): Boolean = plan match { case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct => false + case _: ShowColumnsCommand => true --- End diff -- +1 as mentioned in previous comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/15307 @marmbrus Could you take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11610: [SPARK-13777] [ML] Remove constant features from trainin...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/11610 This problem should be handled by https://github.com/apache/spark/pull/15394 if it is merged. It seems this is no longer active, and we are pursuing alternative solutions. Shall we close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82937255 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { // Returns true if the plan is supposed to be sorted. def isSorted(plan: LogicalPlan): Boolean = plan match { case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct => false + case _: ShowColumnsCommand => true --- End diff -- marking `ShowColumnsCommand` as sorted is more weird, I'd like to leave the result sorted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9008: [SPARK-9478] [ml] Add class weights to Random Forest
Github user sethah commented on the issue: https://github.com/apache/spark/pull/9008 @rotationsymmetry Could you please close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/15375 @falaki @felixcheung The DirectKafkaStreamSuite is a known flaky test. Nothing in this patch should affect Kafka. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15414#discussion_r82931901 --- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml + +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.sql.{DataFrame, Dataset} +import org.apache.spark.sql.types._ + +class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest { + + import testImplicits._ + + class MockPredictor(override val uid: String) --- End diff -- move into companion object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15414#discussion_r82932068 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala --- @@ -121,10 +122,18 @@ abstract class Predictor[ * and put it in an RDD with strong types. */ protected def extractLabeledPoints(dataset: Dataset[_]): RDD[LabeledPoint] = { -dataset.select(col($(labelCol)).cast(DoubleType), col($(featuresCol))).rdd.map { +dataset.select(col($(labelCol)), col($(featuresCol))).rdd.map { case Row(label: Double, features: Vector) => LabeledPoint(label, features) } } + + /** + * Return the given DataFrame, with [[labelCol]] casted to DoubleType. + */ +protected def castDataSet(dataset: Dataset[_]): DataFrame = { --- End diff -- let's just put this logic directly in `fit` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15414#discussion_r82935295 --- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml + +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.sql.{DataFrame, Dataset} +import org.apache.spark.sql.types._ + +class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest { + + import testImplicits._ + + class MockPredictor(override val uid: String) +extends Predictor[Vector, MockPredictor, MockPredictionModel] { + +override def train(dataset: Dataset[_]): MockPredictionModel = { + require(dataset.schema("label").dataType == DoubleType) + new MockPredictionModel(uid) +} + +override def copy(extra: ParamMap): MockPredictor = defaultCopy(extra) + } + + class MockPredictionModel(override val uid: String) +extends PredictionModel[Vector, MockPredictionModel] { + +override def predict(features: Vector): Double = 1.0 --- End diff -- `override def predict(features: Vector): Double = throw new NotImplementedError()` We can do this for everything except `train`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15414#discussion_r82932894 --- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml + +import org.apache.spark.SparkFunSuite +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.sql.{DataFrame, Dataset} +import org.apache.spark.sql.types._ + +class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest { + + import testImplicits._ + + class MockPredictor(override val uid: String) +extends Predictor[Vector, MockPredictor, MockPredictionModel] { + +override def train(dataset: Dataset[_]): MockPredictionModel = { + require(dataset.schema("label").dataType == DoubleType) + new MockPredictionModel(uid) +} + +override def copy(extra: ParamMap): MockPredictor = defaultCopy(extra) + } + + class MockPredictionModel(override val uid: String) +extends PredictionModel[Vector, MockPredictionModel] { + +override def predict(features: Vector): Double = 1.0 + +override def copy(extra: ParamMap): MockPredictionModel = defaultCopy(extra) + } + + test("should support all NumericType labels and not support other types") { +val predictor = new MockPredictor("mock") +MLTestingUtils.checkNumericTypes[MockPredictionModel, MockPredictor]( --- End diff -- Why don't we just cycle through the types here and call `fit`. I think it's a bit confusing the way it is now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15414#discussion_r82932799 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -117,7 +117,7 @@ object MLTestingUtils extends SparkFunSuite { Seq(ShortType, LongType, IntegerType, FloatType, ByteType, DoubleType, DecimalType(10, 0)) types.map { t => val castDF = df.select(col(labelColName).cast(t), col(featuresColName)) -t -> TreeTests.setMetadata(castDF, 2, labelColName, featuresColName) +t -> TreeTests.setMetadata(castDF, 0, labelColName, featuresColName) --- End diff -- What is this for? If the intent is to force `getNumClasses` to infer the number of classes, then you're no longer testing the not inferred case. Further, the point of this PR is to eliminate the need to do that since it is not a robust solution, IMO. Also, I'd like to remove the dependence on `TreeTests` here (and `genRegressionDF`) and just explicitly set the attributes in the functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15172 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15172 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66786/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15172 **[Test build #66786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66786/consoleFull)** for PR 15172 at commit [`46b52e6`](https://github.com/apache/spark/commit/46b52e63918376dcf5dde0359fdfe1efa2456dfd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15172 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66784/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15172 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66785/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66784 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66784/consoleFull)** for PR 15307 at commit [`35bf508`](https://github.com/apache/spark/commit/35bf5089f0d79ba0ba007ca9983a75616f1a553d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15172 **[Test build #66785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66785/consoleFull)** for PR 15172 at commit [`0bf663f`](https://github.com/apache/spark/commit/0bf663f0d8a71b2944d4030dc0ef95e36ee35471). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15446: [SPARK-17882][SparkR] Fix swallowed exception in RBacken...
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15446 @shivaram yes I just noticed it during my debugging and fixed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r82933318 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"longer running") } - if (disallowStageRetryForTest) { -abortStage(failedStage, "Fetch failure will not retry stage due to testing config", - None) - } else if (failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId)) { -abortStage(failedStage, s"$failedStage (${failedStage.name}) " + - s"has failed the maximum allowable number of " + - s"times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. " + - s"Most recent failure reason: ${failureMessage}", None) - } else { -if (failedStages.isEmpty) { - // Don't schedule an event to resubmit failed stages if failed isn't empty, because - // in that case the event will already have been scheduled. - // TODO: Cancel running tasks in the stage - logInfo(s"Resubmitting $mapStage (${mapStage.name}) and " + -s"$failedStage (${failedStage.name}) due to fetch failure") - messageScheduler.schedule(new Runnable { -override def run(): Unit = eventProcessLoop.post(ResubmitFailedStages) - }, DAGScheduler.RESUBMIT_TIMEOUT, TimeUnit.MILLISECONDS) + val shouldAbortStage = +failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) || +disallowStageRetryForTest + + if (shouldAbortStage) { +val abortMessage = if (disallowStageRetryForTest) { + "Fetch failure will not retry stage due to testing config" +} else { + s"""$failedStage (${failedStage.name}) + |has failed the maximum allowable number of + |times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. + |Most recent failure reason: $failureMessage""".stripMargin.replaceAll("\n", " ") } +abortStage(failedStage, abortMessage, None) + } else { // update failedStages and make sure a ResubmitFailedStages event is enqueued +// TODO: Cancel running tasks in the failed stage -- cf. SPARK-17064 +val noResubmitEnqueued = !failedStages.contains(failedStage) --- End diff -- I think I was worried about the opposite problem -- perhaps we add `mapStage` to `failedStages`, but fail to fire a `Resubmit` event. Maybe too many negatives to think through this clearly -- my intention was *more* logging & resubmission, not less. I suppose I was thinking of it as: ```scala val addedToFailedStages = failedStages.add(failedStage) | failedStages.add(mapStage) if (addedToFailedStage) { logStuff() resubmit() } ``` the point being, to avoid another case of the bug which started this all -- you add to `failedStages`, but fail to ever `Resubmit`. I was thinking of something more like this (though as you'll see, this case is fine). Say you have two jobs submitted concurrently, which share the first few stages. A -> B -> C and A -> B -> D. There is an executor failure while they are both running their independent parts, C & D, concurrently. The failure is detected in C first, so it marks B & C as failed. Later on, the failure is detected in D, it marks B & D as failed. If the first resubmit was already processed, its fine, B is already running, and we mark D as waiting on D. Similarly, its fine if the resubmit wasn't processed yet when the failure is detected in D-- then when the resubmit is processed, we resubmit all 3 stages. I think it also works out even if stage A needs to get resubmitted as well -- its handled in the same call that does the resubmit for B, when it checks for missing parents. (In fact, thinking through these cases makes me think we don't even need to resubmit the `mapStage` at all -- the `failedStage` will submit itself on its resubmit, since it will notice its parents aren't ready. Which is why there isn't a case where this check would really mater.) Anyway, the point is not that I could show you of a case were we *do* need to make sure there is a resubmit. The point is that I'm *not* sure that we do *not* need it, which is why I thought it was better to err on the side of over-logging / resubmitting --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...
[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82932947 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -588,6 +588,12 @@ object SQLConf { .doubleConf .createWithDefault(0.05) + val IGNORE_CORRUPT_FILES = SQLConfigBuilder("spark.sql.files.ignoreCorruptFiles") +.doc("Whether to ignore corrupt files. If true, the Spark jobs will continue to run when " + + "encountering corrupt files and contents that have been read will still be returned.") +.booleanConf +.createWithDefault(false) + --- End diff -- Curious why we are duplicating the parameter in sql namespace. Wont spark.files.ignoreCorruptFiles not do ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82933077 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -170,4 +170,9 @@ package object config { .doc("Port to use for the block managed on the driver.") .fallbackConf(BLOCK_MANAGER_PORT) + private[spark] val IGNORE_CORRUPT_FILES = ConfigBuilder("spark.files.ignoreCorruptFiles") +.doc("Whether to ignore corrupt files. If true, the Spark jobs will continue to run when " + + "encountering corrupt files and contents that have been read will still be returned.") +.booleanConf +.createWithDefault(false) --- End diff -- So either way we will have a behavioral change - if NewHadoopRDD vs HadoopRDD. IMO that is fine, given that we are standardizing on the behavior and this is something which was a corner case anyway. Setting default to false makes sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82932992 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -179,7 +183,16 @@ class NewHadoopRDD[K, V]( override def hasNext: Boolean = { if (!finished && !havePair) { - finished = !reader.nextKeyValue + try { +finished = !reader.nextKeyValue + } catch { +case e: IOException => + if (ignoreCorruptFiles) { +finished = true + } else { +throw e + } + } --- End diff -- Thanks for changing this too ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82932645 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -253,8 +256,12 @@ class HadoopRDD[K, V]( try { finished = !reader.next(key, value) } catch { - case eof: EOFException => -finished = true + case e: IOException => +if (ignoreCorruptFiles) { + finished = true +} else { + throw e +} --- End diff -- nit: case e: IOException if ignoreCorruptFiles => would have been more concise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15444 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66787/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15444 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15444 **[Test build #66787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66787/consoleFull)** for PR 15444 at commit [`b98ccdf`](https://github.com/apache/spark/commit/b98ccdfd696cb89cb4793a140c87c498ce5c3086). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66793/consoleFull)** for PR 9766 at commit [`dc6d5f9`](https://github.com/apache/spark/commit/dc6d5f927d93566ee1c3b935db864f2e517bc7e0). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66793/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66793/consoleFull)** for PR 9766 at commit [`dc6d5f9`](https://github.com/apache/spark/commit/dc6d5f927d93566ee1c3b935db864f2e517bc7e0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15443 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66782/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15443 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15443 **[Test build #66782 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66782/consoleFull)** for PR 15443 at commit [`a843920`](https://github.com/apache/spark/commit/a843920983914de7efd21608b8f0e39c70b210d7). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class StringHistogram(` * ` case class StringHistogramInfo(` * ` class StringHistogramInfoSerializer ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15375 **[Test build #66792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66792/consoleFull)** for PR 15375 at commit [`836e874`](https://github.com/apache/spark/commit/836e8745c346c59f78958e10aec1c6f9537242b9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r82931395 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -25,26 +25,25 @@ object StringUtils { // replace the _ with .{1} exactly match 1 time of any character // replace the % with .*, match 0 or more times with any character - def escapeLikeRegex(v: String): String = { -if (!v.isEmpty) { - "(?s)" + (' ' +: v.init).zip(v).flatMap { -case (prev, '\\') => "" -case ('\\', c) => - c match { -case '_' => "_" -case '%' => "%" -case _ => Pattern.quote("\\" + c) - } -case (prev, c) => - c match { -case '_' => "." -case '%' => ".*" -case _ => Pattern.quote(Character.toString(c)) - } - }.mkString -} else { - v + def escapeLikeRegex(str: String): String = { +val builder = new StringBuilder() +var escaping = false +for (next <- str) { + if (escaping) { +builder ++= Pattern.quote(Character.toString(next)) --- End diff -- `\Q\\E\Qa\E` is correct. But doesn't it become `\Qa\E` in this change? For `\\a`, the prefixing `\\` will go the next branch and enable `escaping`. Then the next char `a` will be quoted here. So it becomes `\Qa\E`. BTW, before this change, it will be `\Q\a\E`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15446: [SPARK-17882][SparkR] Fix swallowed exception in RBacken...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/15446 cc @falaki Is this also a part of #15375 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15446: [SPARK-17882][SparkR] Fix swallowed exception in RBacken...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/15446 Thanks @jrshust for the PR. Jenkins, ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r82931294 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"longer running") } - if (disallowStageRetryForTest) { -abortStage(failedStage, "Fetch failure will not retry stage due to testing config", - None) - } else if (failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId)) { -abortStage(failedStage, s"$failedStage (${failedStage.name}) " + - s"has failed the maximum allowable number of " + - s"times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. " + - s"Most recent failure reason: ${failureMessage}", None) - } else { -if (failedStages.isEmpty) { - // Don't schedule an event to resubmit failed stages if failed isn't empty, because - // in that case the event will already have been scheduled. - // TODO: Cancel running tasks in the stage - logInfo(s"Resubmitting $mapStage (${mapStage.name}) and " + -s"$failedStage (${failedStage.name}) due to fetch failure") - messageScheduler.schedule(new Runnable { -override def run(): Unit = eventProcessLoop.post(ResubmitFailedStages) - }, DAGScheduler.RESUBMIT_TIMEOUT, TimeUnit.MILLISECONDS) + val shouldAbortStage = +failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) || +disallowStageRetryForTest + + if (shouldAbortStage) { +val abortMessage = if (disallowStageRetryForTest) { + "Fetch failure will not retry stage due to testing config" +} else { + s"""$failedStage (${failedStage.name}) + |has failed the maximum allowable number of + |times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. + |Most recent failure reason: $failureMessage""".stripMargin.replaceAll("\n", " ") } +abortStage(failedStage, abortMessage, None) + } else { // update failedStages and make sure a ResubmitFailedStages event is enqueued +// TODO: Cancel running tasks in the failed stage -- cf. SPARK-17064 +val noResubmitEnqueued = !failedStages.contains(failedStage) failedStages += failedStage failedStages += mapStage +if (noResubmitEnqueued) { + // We expect one executor failure to trigger many FetchFailures in rapid succession, + // but all of those task failures can typically be handled by a single resubmission of + // the failed stage. We avoid flooding the scheduler's event queue with resubmit + // messages by checking whether a resubmit is already in the event queue for the + // failed stage. If there is already a resubmit enqueued for a different failed + // stage, that event would also be sufficient to handle the current failed stage, but + // producing a resubmit for each failed stage makes debugging and logging a little + // simpler while not producing an overwhelming number of scheduler events. + logInfo( +s"Resubmitting $mapStage (${mapStage.name}) and " + +s"$failedStage (${failedStage.name}) due to fetch failure" + ) + messageScheduler.schedule( --- End diff -- yeah probably a separate PR, sorry this was just an opportunity for me to rant :) And sorry if I worded it poorly, but I was not suggesting the one w/ "Periodically" as a better comment -- in fact I think its a *bad* comment, just wanted to mention it was another description which used to be there long ago. This was my suggestion: ``` If we get one fetch-failure, we often get more fetch failures across multiple executors. We will get better parallelism when we resubmit the mapStage if we can resubmit when we know about as many of those failures as possible. So this is a heuristic to add a small delay to see if we gather a few more failures before we resubmit. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr..
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66791/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/15375 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66791/consoleFull)** for PR 9766 at commit [`9de8c0e`](https://github.com/apache/spark/commit/9de8c0e7c0a2108b519c8adce7af5162578b04c9). * This patch **fails RAT tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15427 **[Test build #66790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66790/consoleFull)** for PR 15427 at commit [`81339dc`](https://github.com/apache/spark/commit/81339dc429104633ee28cf078f643b5050564557). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #66791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66791/consoleFull)** for PR 9766 at commit [`9de8c0e`](https://github.com/apache/spark/commit/9de8c0e7c0a2108b519c8adce7af5162578b04c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15295 Merging to master! Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org