[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19470 **[Test build #82720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82720/testReport)** for PR 19470 at commit [`8e7fe9b`](https://github.com/apache/spark/commit/8e7fe9b9e3fc6121686caad45dcfdb4ff08f0c4a). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19470 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19488 **[Test build #82722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82722/testReport)** for PR 19488 at commit [`32bdf77`](https://github.com/apache/spark/commit/32bdf771fe70444ac23adf796702b5a26e085805). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19488 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82722/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19269 **[Test build #82719 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82719/testReport)** for PR 19269 at commit [`ac3de3c`](https://github.com/apache/spark/commit/ac3de3c2c5c50df8e5f2c618a78a4fa4c6756797). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WriteToDataSourceV2Command(writer: DataSourceV2Writer, query: LogicalPlan)` * `class RowToInternalRowDataWriterFactory(` * `class RowToInternalRowDataWriter(rowWriter: DataWriter[Row], encoder: ExpressionEncoder[Row])` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19488 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19470 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82720/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19489 @dahaian close this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19269 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19269 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82719/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19470 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19470 R failure seems to be irrelevant. ``` [error] running /home/jenkins/workspace/SparkPullRequestBuilder/R/run-tests.sh ; process was terminated by signal 9 Attempting to post to Github... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19470 **[Test build #82724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82724/testReport)** for PR 19470 at commit [`8e7fe9b`](https://github.com/apache/spark/commit/8e7fe9b9e3fc6121686caad45dcfdb4ff08f0c4a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18732 A late question: shall we create another API for it instead of reusing `pandas_udf`? cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144480289 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -201,6 +201,23 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("SPARK-6: group splitted expressions into one method per nested class") { --- End diff -- thank you very much for your help @viirya ! In my use cases it seemed to be connected to the `dropDuplicates` method and I focused on it, but thanks to your suggestion now I realize that `dropDuplicates` by itself is not enough, it needs also some functions applied to columns to generate the issue! Thank you so much. Where should I add this test case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...
Github user dahaian commented on the issue: https://github.com/apache/spark/pull/19489 @srowen sorry,I just ask a question.If someone can help me.I will close this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144480542 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +831,46 @@ class CodegenContext { | ${makeSplitFunction(body)} |} """.stripMargin -addNewFunction(name, code) +addNewFunctionInternal(name, code, inlineToOuterClass = false) } - foldFunctions(functions.map(name => s"$name(${arguments.map(_._2).mkString(", ")})")) + // Here we store all the methods which have been added to the outer class. + val outerClassFunctions = functions +.filter(_.subclassName.isEmpty) +.map(_.functionName) + + // Here we handle all the methods which have been added to the nested subclasses and + // not to the outer class. + // Since they can be many, their direct invocation in the outer class adds many entries + // to the outer class' constant pool. This can cause the constant pool to past JVM limit. + // To avoid this problem, we group them and we call only the grouping methods in the + // outer class. + val innerClassFunctions = functions +.filter(_.subclassName.isDefined) +.foldLeft(ListMap.empty[(String, String), Seq[String]]) { case (acc, f) => + val key = (f.subclassName.get, f.subclassInstance.get) + acc.updated(key, acc.getOrElse(key, Seq.empty[String]) ++ Seq(f.functionName)) +} +.flatMap { case ((subclassName, subclassInstance), subclassFunctions) => + if (subclassFunctions.size > CodeGenerator.MERGE_SPLIT_METHODS_THRESHOLD) { +// Adding a new function to each subclass which contains +// the invocation of all the ones which have been added to +// that subclass +val code = s""" +|private $returnType $func($argString) { +| ${makeSplitFunction(foldFunctions(subclassFunctions.map(name => + s"$name(${arguments.map(_._2).mkString(", ")})")))} --- End diff -- we are inside the string interpolator here, so `|` is not missing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18732 I think @viirya raised this question too - https://github.com/apache/spark/pull/18732#issuecomment-333065737 and I think I also left few worries about thus here and there. To me, +0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19489 This isn't where you ask questions, and the branch itself is messed up. Please close it. Questions go to the mailing list. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144480771 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -201,6 +201,23 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("SPARK-6: group splitted expressions into one method per nested class") { --- End diff -- I was adding it to `DataFrameAggregateSuite`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144481305 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1010,6 +1079,8 @@ object CodeGenerator extends Logging { // This is the value of HugeMethodLimit in the OpenJDK JVM settings val DEFAULT_JVM_HUGE_METHOD_LIMIT = 8000 + val MERGE_SPLIT_METHODS_THRESHOLD = 3 --- End diff -- @gatorsmile it comes from [this discussion](https://github.com/apache/spark/pull/19480/files/bdc2fdb18688b86a1b30d6ea8a692a3e95858049#r144282402). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19490: [Trivial][DOC] update code style for InteractionExample
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19490 I don't think this is a style fix? this is too trivial to bother with --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...
Github user dahaian commented on the issue: https://github.com/apache/spark/pull/19489 @srowen sorry, I do not understand you said "the mailing list"?Please tell me how I ask a question?And I will close this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19488 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19489 This should be closed no matter what. Please start at the web site. http://spark.apache.org/community.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19488 **[Test build #82725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82725/testReport)** for PR 19488 at commit [`32bdf77`](https://github.com/apache/spark/commit/32bdf771fe70444ac23adf796702b5a26e085805). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19464: [SPARK-22233] [core] Allow user to filter out empty spli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19464 **[Test build #82726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82726/testReport)** for PR 19464 at commit [`534d8fb`](https://github.com/apache/spark/commit/534d8fbcd7dfbdc9af06a4d926f6a353f429fce8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19489: The declared package "org.apache.hive.service.cli...
Github user dahaian closed the pull request at: https://github.com/apache/spark/pull/19489 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19419#discussion_r144483070 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -89,6 +90,14 @@ private[spark] object JettyUtils extends Logging { val result = servletParams.responder(request) response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate") response.setHeader("X-Frame-Options", xFrameOptionsValue) + conf.get(UI_X_XSS_PROTECTION).foreach(response.setHeader("X-XSS-Protection", _)) +if (conf.get(UI_X_CONTENT_TYPE_OPTIONS.key).equals("true")) { --- End diff -- At the moment this would fail if unset --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19419#discussion_r144482977 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -420,6 +420,25 @@ package object config { .toSequence .createWithDefault(Nil) + + private[spark] val UI_X_XSS_PROTECTION = +ConfigBuilder("spark.ui.xXssProtection") + .doc("Value for HTTP X-XSS-Protection response header") + .stringConf + .createOptional + + private[spark] val UI_X_CONTENT_TYPE_OPTIONS = +ConfigBuilder("spark.ui.xContentTypeOptions.enabled") + .doc("Set to 'true' for setting X-Content-Type-Options HTTP response header to 'nosniff'") + .stringConf --- End diff -- This should be a boolean option then? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19419#discussion_r144483296 --- Diff: docs/configuration.md --- @@ -2013,7 +2013,62 @@ Apart from these, the following properties are also available, and may be useful +### HTTP Security Headers +Apache Spark can be configured to include HTTP Headers which aids in preventing Cross +Site Scripting (XSS), Cross-Frame Scripting (XFS), MIME-Sniffing and also enforces HTTP +Strict Transport Security. + + +Property NameDefaultMeaning + +spark.ui.xXssProtection +None + +Value for HTTP X-XSS-Protection response header. You can choose appropriate value +from below: + + 0 (Disables XSS filtering) --- End diff -- Markdown won't work here? I think it won't. Close the `` tags though. It's not clear below what the values the users sets though. The values are `0`, `1`, or `1; mode=block` and that should be clear. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19419#discussion_r144483342 --- Diff: docs/configuration.md --- @@ -2013,7 +2013,62 @@ Apart from these, the following properties are also available, and may be useful +### HTTP Security Headers +Apache Spark can be configured to include HTTP Headers which aids in preventing Cross +Site Scripting (XSS), Cross-Frame Scripting (XFS), MIME-Sniffing and also enforces HTTP +Strict Transport Security. + + +Property NameDefaultMeaning + +spark.ui.xXssProtection +None + +Value for HTTP X-XSS-Protection response header. You can choose appropriate value +from below: + + 0 (Disables XSS filtering) + 1 (Enables XSS filtering. If a cross-site scripting attack is detected, +the browser will sanitize the page.) + 1; mode=block (Enables XSS filtering. The browser will prevent rendering +of the page if an attack is detected.) + + + + +spark.ui.allowFramingFrom +SAMEORIGIN + +Value for X-Frame-Options HTTP response header +You can provide the "website uri" which can only be displayed in a frame on +the specified origin. + --- End diff -- Remove this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...
Github user dahaian commented on the issue: https://github.com/apache/spark/pull/19489 @srowen I closed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19488#discussion_r144483555 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -205,14 +205,15 @@ object PhysicalAggregation { case logical.Aggregate(groupingExpressions, resultExpressions, child) => // A single aggregate expression might appear multiple times in resultExpressions. // In order to avoid evaluating an individual aggregate function multiple times, we'll - // build a set of the distinct aggregate expressions and build a function which can + // build a map of the distinct aggregate expressions and build a function which can // be used to re-write expressions so that they reference the single copy of the // aggregate function which actually gets computed. - val aggregateExpressions = resultExpressions.flatMap { expr => + val aggregateExpressionMap = resultExpressions.flatMap { expr => expr.collect { - case agg: AggregateExpression => agg + case agg: AggregateExpression => (agg.canonicalized, agg.deterministic) -> agg --- End diff -- I think non-deterministic functions should not be deduplicated, e.g. `select max(a + rand()), max(a + rand()) from ...` should still eveluate 2 aggregate funcitions. my suggestion: ``` val aggregateExpressions = resultExpressions.flatMap { expr => expr.collect { case agg: AggregateExpression => agg } } val aggregateExpressionMap = aggregateExpressions.filter(_.deterministic).map { agg => agg.canonicalized -> agg }.toMap ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19476: [SPARK-22062][CORE] Spill large block to disk in BlockMa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19476 **[Test build #82727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82727/testReport)** for PR 19476 at commit [`2d00fd2`](https://github.com/apache/spark/commit/2d00fd25043c40ab271f4da1dc5b4dd7a407fc6f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19419#discussion_r144488504 --- Diff: docs/configuration.md --- @@ -2013,7 +2013,62 @@ Apart from these, the following properties are also available, and may be useful +### HTTP Security Headers --- End diff -- I think this could move to `security.md` as a security related advanced configurations. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19363: [SPARK-22224][SQL]Override toString of KeyValue/Relation...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/19363 cc @viirya @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...
Github user krishna-pandey commented on a diff in the pull request: https://github.com/apache/spark/pull/19419#discussion_r144495102 --- Diff: docs/configuration.md --- @@ -2013,7 +2013,62 @@ Apart from these, the following properties are also available, and may be useful +### HTTP Security Headers +Apache Spark can be configured to include HTTP Headers which aids in preventing Cross +Site Scripting (XSS), Cross-Frame Scripting (XFS), MIME-Sniffing and also enforces HTTP +Strict Transport Security. + + +Property NameDefaultMeaning + +spark.ui.xXssProtection +None + +Value for HTTP X-XSS-Protection response header. You can choose appropriate value +from below: + + 0 (Disables XSS filtering) + 1 (Enables XSS filtering. If a cross-site scripting attack is detected, +the browser will sanitize the page.) + 1; mode=block (Enables XSS filtering. The browser will prevent rendering +of the page if an attack is detected.) + + + + +spark.ui.allowFramingFrom +SAMEORIGIN + +Value for X-Frame-Options HTTP response header +You can provide the "website uri" which can only be displayed in a frame on +the specified origin. + --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...
Github user krishna-pandey commented on a diff in the pull request: https://github.com/apache/spark/pull/19419#discussion_r144495051 --- Diff: docs/configuration.md --- @@ -2013,7 +2013,62 @@ Apart from these, the following properties are also available, and may be useful +### HTTP Security Headers +Apache Spark can be configured to include HTTP Headers which aids in preventing Cross +Site Scripting (XSS), Cross-Frame Scripting (XFS), MIME-Sniffing and also enforces HTTP +Strict Transport Security. + + +Property NameDefaultMeaning + +spark.ui.xXssProtection +None + +Value for HTTP X-XSS-Protection response header. You can choose appropriate value +from below: + + 0 (Disables XSS filtering) --- End diff -- Removed the lists, however it is being used in many places in existing configuration.md --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144500085 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -564,4 +565,30 @@ class KeyValueGroupedDataset[K, V] private[sql]( encoder: Encoder[R]): Dataset[R] = { cogroup(other)((key, left, right) => f.call(key, left.asJava, right.asJava).asScala)(encoder) } + + override def toString: String = { +try { + val builder = new StringBuilder + val kFields = kExprEnc.schema.map { +case f => s"${f.name}: ${f.dataType.simpleString(2)}" + } + val vFields = vExprEnc.schema.map { +case f => s"${f.name}: ${f.dataType.simpleString(2)}" + } + builder.append("[key: [") + builder.append(kFields.take(2).mkString(", ")) + if (kFields.length > 2) { +builder.append(" ... " + (kFields.length - 2) + " more field(s)") + } + builder.append("], value: [") + builder.append(vFields.take(2).mkString(", ")) + if (vFields.length > 2) { +builder.append(" ... " + (vFields.length - 2) + " more field(s)") + } + builder.append("]]").toString() +} catch { + case NonFatal(e) => --- End diff -- Although some methods of `StringBuilder` can throw exceptions, I didn't see simple `append` can throw exception. Access of `schema` should not throw exception at all. So I don't get why it is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144501685 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -465,6 +466,24 @@ class RelationalGroupedDataset protected[sql]( Dataset.ofRows(df.sparkSession, plan) } + + override def toString: String = { +try { + val builder = new StringBuilder + builder.append("RelationalGroupedDataset: [key: [") + val kFields = groupingExprs.map(_.asInstanceOf[NamedExpression]).map { +case f => s"${f.name}: ${f.dataType.simpleString(2)}" + } + builder.append(kFields.take(2).mkString(", ")) + if (kFields.length > 2) { +builder.append(" ... " + (kFields.length - 2) + " more field(s)") + } + builder.append(s"], value: ${df.toString}, $groupType]").toString() --- End diff -- `value:` or `df:`? `$groupType` -> `type: $groupType`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144501966 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala --- @@ -46,6 +36,20 @@ abstract class QueryTest extends PlanTest { Locale.setDefault(Locale.US) /** + * Makes sure the answer matches the expected result + */ + def checkString(expected: String, actual: String): Unit = { +if (expected != actual) { + fail( +"KeyValueGroupedDataset.toString() gives wrong result:\n\n" + sideBySide( --- End diff -- You didn't only use this against `KeyValueGroupedDataset`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19470 **[Test build #82724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82724/testReport)** for PR 19470 at commit [`8e7fe9b`](https://github.com/apache/spark/commit/8e7fe9b9e3fc6121686caad45dcfdb4ff08f0c4a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19470 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144500697 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -465,6 +466,24 @@ class RelationalGroupedDataset protected[sql]( Dataset.ofRows(df.sparkSession, plan) } + + override def toString: String = { +try { + val builder = new StringBuilder + builder.append("RelationalGroupedDataset: [key: [") + val kFields = groupingExprs.map(_.asInstanceOf[NamedExpression]).map { --- End diff -- Are all expressions in `groupingExprs` `NamedExpression`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144502598 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/RelationalGroupedDatasetSuite.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.test.SharedSQLContext + +class RelationalGroupedDatasetSuite extends QueryTest with SharedSQLContext { + import testImplicits._ + + test("Check RelationalGroupedDataset toString: Single data") { +val kvDataset = (1 to 3).toDF("id").as[SingleData].groupBy("id") --- End diff -- Is `SingleData` necessary? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19470 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82724/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144502680 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/RelationalGroupedDatasetSuite.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.test.SharedSQLContext + +class RelationalGroupedDatasetSuite extends QueryTest with SharedSQLContext { + import testImplicits._ + + test("Check RelationalGroupedDataset toString: Single data") { +val kvDataset = (1 to 3).toDF("id").as[SingleData].groupBy("id") +val expected = "RelationalGroupedDataset: [key: [id: int], value: [id: int], GroupByType]" +val actual = kvDataset.toString +checkString(expected, actual) + } + + test("Check RelationalGroupedDataset toString: over length schema ") { +val kvDataset = (1 to 3).map( x => (x, x.toString, x.toLong)) + .toDF("id", "val1", "val2").as[TripleData].groupBy("id") --- End diff -- Is `TripleData` necessary? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19491: [SPARK-22273][SQL] Fix key/value schema field nam...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19491 [SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators. ## What changes were proposed in this pull request? When fixing schema field names using escape characters with `addReferenceMinorObj()` at [SPARK-18952](https://issues.apache.org/jira/browse/SPARK-18952) (#16361), double-quotes around the names were remained and the names become something like `"((java.lang.String) references[1])"`. ```java /* 055 */ private int maxSteps = 2; /* 056 */ private int numRows = 0; /* 057 */ private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("((java.lang.String) references[1])", org.apache.spark.sql.types.DataTypes.StringType); /* 058 */ private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("((java.lang.String) references[2])", org.apache.spark.sql.types.DataTypes.LongType); /* 059 */ private Object emptyVBase; ``` We should remove the double-quotes to refer the values in `references` properly: ```java /* 055 */ private int maxSteps = 2; /* 056 */ private int numRows = 0; /* 057 */ private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add(((java.lang.String) references[1]), org.apache.spark.sql.types.DataTypes.StringType); /* 058 */ private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add(((java.lang.String) references[2]), org.apache.spark.sql.types.DataTypes.LongType); /* 059 */ private Object emptyVBase; ``` ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-22273 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19491.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19491 commit c23e3d060a473c1d2558c60689a6f552ccbbe6f2 Author: Takuya UESHIN Date: 2017-09-28T13:33:35Z Remove double-quotes around field name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144501447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -465,6 +466,24 @@ class RelationalGroupedDataset protected[sql]( Dataset.ofRows(df.sparkSession, plan) } + + override def toString: String = { +try { + val builder = new StringBuilder + builder.append("RelationalGroupedDataset: [key: [") --- End diff -- `key:` or `groupingBy:`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19491: [SPARK-22273][SQL] Fix key/value schema field names in H...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19491 **[Test build #82728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82728/testReport)** for PR 19491 at commit [`c23e3d0`](https://github.com/apache/spark/commit/c23e3d060a473c1d2558c60689a6f552ccbbe6f2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17290: [SPARK-16599][CORE] java.util.NoSuchElementException: No...
Github user aphasingnirvana commented on the issue: https://github.com/apache/spark/pull/17290 @srowen I believe the bug still persists. Shouldn't we reopen it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19487: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19487 LGTM. I'm going stick out today a slight roll of my PathOutputCommitter class which is one layer above FileOutputCommitter : lets people write committers without output & work paths, yet avoid going near complexity that is FileOutputFormat. See [PathOutputCommitter](https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/com/hortonworks/spark/cloud/commit/PathOutputCommitProtocol.scala); if I get my changes into Hadoop 3 then this committer will work for all versions of Hadoop 3.x, even though the S3A stuff is targeting Hadoop 3.1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18979#discussion_r144505454 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala --- @@ -57,7 +60,14 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration) private def getFileSize(filePath: String): Long = { val path = new Path(filePath) val fs = path.getFileSystem(hadoopConf) -fs.getFileStatus(path).getLen() +try { + fs.getFileStatus(path).getLen() +} catch { + case e: FileNotFoundException => +// may arise against eventually consistent object stores +logInfo(s"File $path is not yet visible", e) --- End diff -- say "Reported file size in job summary may be invalid"? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17290: [SPARK-16599][CORE] java.util.NoSuchElementException: No...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17290 I don't think this change was the ultimate fix, and it caused another problem, so no I don't think this PR should be reopened. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18805 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18805 **[Test build #82729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82729/testReport)** for PR 18805 at commit [`029a753`](https://github.com/apache/spark/commit/029a753ad4be6881c4e1721eecdfaad0f8b158bd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Noted :) @dongjoon-hyun : is the issue with ORC that if there's nothing to write, it doesn't generate a file (so avoiding that issue with sometimes you get 0-byte ORC files & things downstream fail)? If so, the warning message which @gatorsmile has proposed is potentially going to mislead people into worrying about a problem which isn't there. and the numFiles metric is going to mislead. I'm starting to worry about how noisy the log would be, both there and when working with s3 when it's playing delayed visibility (rarer). 1. What if this patch just logged at debug: less noise, but still something there if people are trying to debug a mismatch? 1. if there's no file found, numFiles doesn't get incremented. 1. I count the number of files actually submitted 1. And in `getFinalStats()` log @ info if there is a mismatch This would line things up in future for actually returning the list of expected vs actual files up as a metric where it could be reported. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19488 **[Test build #82725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82725/testReport)** for PR 19488 at commit [`32bdf77`](https://github.com/apache/spark/commit/32bdf771fe70444ac23adf796702b5a26e085805). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19488 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19488 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82725/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82730/testReport)** for PR 18979 at commit [`adab985`](https://github.com/apache/spark/commit/adab985d0455f3f549ebceb03528f0a6a45a31c0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19464: [SPARK-22233] [core] Allow user to filter out empty spli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19464 **[Test build #82726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82726/testReport)** for PR 19464 at commit [`534d8fb`](https://github.com/apache/spark/commit/534d8fbcd7dfbdc9af06a4d926f6a353f429fce8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19464: [SPARK-22233] [core] Allow user to filter out empty spli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19464 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19464: [SPARK-22233] [core] Allow user to filter out empty spli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19464 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82726/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19476: [SPARK-22062][CORE] Spill large block to disk in BlockMa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19476 **[Test build #82727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82727/testReport)** for PR 19476 at commit [`2d00fd2`](https://github.com/apache/spark/commit/2d00fd25043c40ab271f4da1dc5b4dd7a407fc6f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19476: [SPARK-22062][CORE] Spill large block to disk in BlockMa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19476 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82727/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19476: [SPARK-22062][CORE] Spill large block to disk in BlockMa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19476 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user amankothari04 commented on the issue: https://github.com/apache/spark/pull/16578 @viirya did you get a chance to review this ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82731/testReport)** for PR 18979 at commit [`649f8da`](https://github.com/apache/spark/commit/649f8da245443567b842b697a4d47e5241eb5946). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19452: [SPARK-22136][SS] Evaluate one-sided conditions e...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19452#discussion_r144532481 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingSymmetricHashJoinHelperSuite.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import org.apache.spark.sql.Column +import org.apache.spark.sql.catalyst.analysis.SimpleAnalyzer +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.execution.{LeafExecNode, LocalTableScanExec, SparkPlan} +import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec +import org.apache.spark.sql.execution.streaming.StreamingSymmetricHashJoinHelper.JoinConditionSplitPredicates +import org.apache.spark.sql.types.DataTypes + +class StreamingSymmetricHashJoinHelperSuite extends StreamTest { + import org.apache.spark.sql.functions._ + + val attributeA = AttributeReference("a", DataTypes.IntegerType)() + val attributeB = AttributeReference("b", DataTypes.IntegerType)() + val attributeC = AttributeReference("c", DataTypes.IntegerType)() + val attributeD = AttributeReference("d", DataTypes.IntegerType)() + val colA = new Column(attributeA) + val colB = new Column(attributeB) + val colC = new Column(attributeC) + val colD = new Column(attributeD) + + val left = new LocalTableScanExec(Seq(attributeA, attributeB), Seq()) + val right = new LocalTableScanExec(Seq(attributeC, attributeD), Seq()) + + test("empty") { +val split = JoinConditionSplitPredicates(None, left, right) +assert(split.leftSideOnly.isEmpty) +assert(split.rightSideOnly.isEmpty) +assert(split.bothSides.isEmpty) +assert(split.full.isEmpty) + } + + test("only literals") { +// Literal-only conjuncts end up on the left side because that's the first bucket they fit in. +// There's no semantic reason they couldn't be in any bucket. +val predicate = (lit(1) < lit(5) && lit(6) < lit(7) && lit(0) === lit(-1)).expr +val split = JoinConditionSplitPredicates(Some(predicate), left, right) + +assert(split.leftSideOnly.contains(predicate)) +assert(split.rightSideOnly.isEmpty) --- End diff -- Nit: why should rightSideOnly be empty? The literals CAN be evaluated using right side only. This feels very asymmetrical situation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19452: [SPARK-22136][SS] Evaluate one-sided conditions e...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19452#discussion_r144533029 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingSymmetricHashJoinHelperSuite.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import org.apache.spark.sql.Column +import org.apache.spark.sql.catalyst.analysis.SimpleAnalyzer +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.execution.{LeafExecNode, LocalTableScanExec, SparkPlan} +import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec +import org.apache.spark.sql.execution.streaming.StreamingSymmetricHashJoinHelper.JoinConditionSplitPredicates +import org.apache.spark.sql.types.DataTypes + +class StreamingSymmetricHashJoinHelperSuite extends StreamTest { + import org.apache.spark.sql.functions._ + + val attributeA = AttributeReference("a", DataTypes.IntegerType)() + val attributeB = AttributeReference("b", DataTypes.IntegerType)() + val attributeC = AttributeReference("c", DataTypes.IntegerType)() + val attributeD = AttributeReference("d", DataTypes.IntegerType)() --- End diff -- the tests would be much more intuitive and therefore easier to read if these columns and variables are named such that they are obviously part of left of right. E.g. leftAttribute1 instead of attributeA, leftAttribute2 instead of attributeB, etc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19491: [SPARK-22273][SQL] Fix key/value schema field names in H...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19491 **[Test build #82728 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82728/testReport)** for PR 19491 at commit [`c23e3d0`](https://github.com/apache/spark/commit/c23e3d060a473c1d2558c60689a6f552ccbbe6f2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19491: [SPARK-22273][SQL] Fix key/value schema field names in H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19491 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82728/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19491: [SPARK-22273][SQL] Fix key/value schema field names in H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19491 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82732/testReport)** for PR 18979 at commit [`d3f96f6`](https://github.com/apache/spark/commit/d3f96f63f263f134a79e76e8bdd1961f333b7c7a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144541081 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -201,6 +201,23 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("SPARK-6: group splitted expressions into one method per nested class") { --- End diff -- @viirya I have a good and a bad news... Thanks to your suggestion I have been able to understand and reproduce the issue. Moreover, I found also another issue which is fixed by this problem and I am adding a UT for that too: in some cases, we might have a ``` Code of method apply(...) grows beyond 64 KB ``` And with this PR the problem is fixed. The bad thing is that the UT you provided still fails, but with a different error: actually it is always a Constant Pool limit exceeded exception, but it is in a NestedClass. From my analysis, this is caused by another problem, ie. that we might reference too many fields of the superclass in the NestedClasses. This might be addressed maybe trying to tune the magic number which I brought to 1000k in this PR, but I am pretty sure that it will be also addressed by the ongoing PR for SPARK-18016, since he is trying to reduce the number of variables. Thus I consider this out of scope for this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19448 Thanks for reviewing this/getting it in. Personally, I had it in the "improvement" category rather than bug fix. If it wasn't for that line in the docs, there'd be no ambiguity about improve/vs fix, and there is always a lower-risk way to fix doc/code mismatch: change the docs. But I'm grateful for it being in; with the backport to branch-2 ryan should be able to use it semi-immediately --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19488 **[Test build #82733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82733/testReport)** for PR 19488 at commit [`1cca72b`](https://github.com/apache/spark/commit/1cca72ba34a9e29cf51b56f7e9c8840c75ffe5da). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19480 **[Test build #82734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82734/testReport)** for PR 19480 at commit [`c4601b4`](https://github.com/apache/spark/commit/c4601b4ae1f6ba618e1a92e75b06faaf8d14bee3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19492 [SPARK-8][SQL] Add support for array to from_json ## What changes were proposed in this pull request? The fix introduces support for parsing array of primitive types in the `from_json` function. Currently, it supports only array of struct, which is equivalent to array of JSON objects, but it doesn't allow to parse array of simple types, like array of strings (eg. `["this is a valid JSON", "we cannot parse before the PR"]`). ## How was this patch tested? Added UTs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19492.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19492 commit cba1f4accaf5dbe305e366eab33879577d7f9a93 Author: Marco Gaido Date: 2017-10-11T10:29:26Z [SPARK-8] Add support for array to from_json --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19448 PS, for people who are interested in dynamic committers, [MAPREDUCE-6823](https://issues.apache.org/jira/browse/MAPREDUCE-6823) is something to look at. It allows you to switch committers under pretty much everything *other than parquet*...this patch helps make Parquet manageable too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/19488#discussion_r144543377 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -205,14 +205,15 @@ object PhysicalAggregation { case logical.Aggregate(groupingExpressions, resultExpressions, child) => // A single aggregate expression might appear multiple times in resultExpressions. // In order to avoid evaluating an individual aggregate function multiple times, we'll - // build a set of the distinct aggregate expressions and build a function which can + // build a map of the distinct aggregate expressions and build a function which can // be used to re-write expressions so that they reference the single copy of the // aggregate function which actually gets computed. - val aggregateExpressions = resultExpressions.flatMap { expr => + val aggregateExpressionMap = resultExpressions.flatMap { expr => expr.collect { - case agg: AggregateExpression => agg + case agg: AggregateExpression => (agg.canonicalized, agg.deterministic) -> agg --- End diff -- Thank you, @cloud-fan! You've made a very good point! We should address non-deterministic correctly here. I have confirmed, though, that all aggregate functions should be deterministic (and fortunately Spark already guarantees this in the analysis stage). So I think we are safe to use "agg.canonicalized" without considering the "deterministic" field here, and so as not to be confusing. I have also added a code comment addressing this point in my latest commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19492: [SPARK-22228][SQL] Add support for array...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19492 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 The latest PR update pulls in @dongjoon-hyun's new test; to avoid merge conflict in the Insert suite I've rebased against master. 1. Everything handles missing files on output 2. There's only one logInfo at the end of the execute call, so if many empty files are created, the logs aren't too noisy. 3. There is now some implicit counting of how many files were missing `= submittedFiles - numFiles`; this isn't aggregated and reported though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19487: [SPARK-21549][CORE] Respect OutputFormats with no...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19487#discussion_r144545827 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -60,15 +71,6 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String) */ private def absPathStagingDir: Path = new Path(path, "_temporary-" + jobId) --- End diff -- what if path is null here when passed in? I'd actually expect an IllegalArgumentException being raised, at least from looking at Path.checkParthArg() --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19448 I guess we wouldn't change the docs in branch-2.2 alone as we have a safe fix here for this mismatch anyway. I think I just wanted to say this backport can be justified. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19487: [SPARK-21549][CORE] Respect OutputFormats with no/invali...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19487 Looking a bit more at this. I see it handles """ as well as empty, and also other forms of invalid URI which Path can't handle today ("multiple colons except with file:// on windows, etc). And as long as you don't call `absPathStagingDir` you don't get an exception there. do you actually want the code to downgrade if an invalid URI is passed in, that is: only skip if the path is empty/null, but not for other things like a path of "::" ? As there you may want to reject it. In which case you'd change the `hasValidPath` query to just looking at the string, and reinstate the test for the path with an invalid non-empty URI BTW, "test:" is possibly a valid path for "the home directory in the schema 'test'. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18979 **[Test build #82730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82730/testReport)** for PR 18979 at commit [`adab985`](https://github.com/apache/spark/commit/adab985d0455f3f549ebceb03528f0a6a45a31c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82730/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18979 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144554795 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -201,6 +201,23 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("SPARK-6: group splitted expressions into one method per nested class") { --- End diff -- @mgaido91 Do you meant "2: compact primitive declarations into arrays" in SPARK-18016? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144555259 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2103,4 +2103,35 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)), Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 3, 1), Row(7, 3, 2))) } + + test("SPARK-6: splitExpressions should not cause \"Code of method grows beyond 64 KB\"") { --- End diff -- `splitExpressions should not generate codes beyond 64KB`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/19263 @jerryshao Agreed on both points. Changed! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/18732 @cloud-fan, it's a good question, I thought quite a bit about it and discussed with @viirya -https://github.com/apache/spark/pull/18732#pullrequestreview-66106082 Just to recap, I think from a API perspective, having just one decorator `pandas_udf` making it easier for user to use - they don't need to think about which decorator to use where. It does make it a little bit complicated for implementation because some code have to interpret the context in which a pandas_udf is used, i.e., `pandas_udf` in `groupby()apply()` is a `pandas.DataFrame -> pandas.DataFrame`, and in `withColumn`, `select` it's `pandas.Series -> pandas.Series`. Another thought is even if we were to introduce something like `pandas_df_udf`, for instance, we might still run into issues in the future, where, say, we want a aggregate pandas udf that defines mapping `pandas.Series -> scalar`, so I don't think we can define a decorator for every input/output shape because there can potentially be many. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144559514 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -201,6 +201,23 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("SPARK-6: group splitted expressions into one method per nested class") { --- End diff -- @mgaido91 Thanks for trying it. Yeah, those expressions like `skewness` are very complicated, so they're likely to cause the issue you encountered. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18805 **[Test build #82729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82729/testReport)** for PR 18805 at commit [`029a753`](https://github.com/apache/spark/commit/029a753ad4be6881c4e1721eecdfaad0f8b158bd). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18805 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82729/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18805 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19481: [SPARK-21907][CORE][BACKPORT 2.2] oom during spill
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19481 @eyalfa can you close the PR? The github infra does not this automatically for backports. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org