[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21935 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93899/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21935 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21935 **[Test build #93899 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93899/testReport)** for PR 21935 at commit [`09ad6e9`](https://github.com/apache/spark/commit/09ad6e9f022740182312b29e20d5ff52778f63ed). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on the issue: https://github.com/apache/spark/pull/20838 @holdenk Did we miss the window? I still count 10 undefined names in this repo. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21950 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21950: [SPARK-24912][SQL][WIP] Add configuration to avoid OOM d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21950 **[Test build #93912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93912/testReport)** for PR 21950 at commit [`aa2a957`](https://github.com/apache/spark/commit/aa2a957751a906fe538822cace019014e763a8c3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21950: [SPARK-24912][SQL][WIP] Add configuration to avoi...
GitHub user bersprockets opened a pull request: https://github.com/apache/spark/pull/21950 [SPARK-24912][SQL][WIP] Add configuration to avoid OOM during broadcast join (and other negative side effects of incorrect table sizing) ## What changes were proposed in this pull request? Added configuration settings to help avoid OOM errors during broadcast joins. - deser multiplication factor: Tell Spark to multiply totalSize times a specified factor for tables with encoded files (i.e., parquet or orc files). Spark will do this when calculating a table's sizeInBytes. This is modelled after Hive's hive.stats.deserialization.factor configuration setting. - ignore rawDataSize: Due to HIVE-20079, rawDataSize is broken. This settings tells Spark to ignore rawDataSize when calculating the table's sizeInBytes. One can partially simulate the deser multiplication factor without this change by decreasing the value in spark.sql.autoBroadcastJoinThreshold. However, that will affect all tables, not just the ones that are encoded. There is some awkwardness in that the check for file type (parquet or orc) uses Hive deser names, but the checks for partitioned tables need to be made outside of the Hive submodule. Still working that out. ## How was this patch tested? Added unit tests. Also, checked that I can avoid broadcast join OOM errors when using the deser multiplication factor on both my laptop and a cluster. Also checked that I can avoid OOM errors using the ignore rawDataSize flag on my laptop. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bersprockets/spark SPARK-24914 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21950.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21950 commit aa2a957751a906fe538822cace019014e763a8c3 Author: Bruce Robbins Date: 2018-07-26T00:36:17Z WIP version --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21950: [SPARK-24912][SQL][WIP] Add configuration to avoid OOM d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21950 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21950: [SPARK-24912][SQL][WIP] Add configuration to avoid OOM d...
Github user holdensmagicalunicorn commented on the issue: https://github.com/apache/spark/pull/21950 @bersprockets, thanks! I am a bot who has found some folks who might be able to help with the review:@cloud-fan, @gatorsmile and @rxin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21898 **[Test build #93911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93911/testReport)** for PR 21898 at commit [`94d3671`](https://github.com/apache/spark/commit/94d36719e03cdbda969c97603c7f96552372a07e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1587/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21898 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21946 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21946 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93896/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21948 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21948 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93898/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21949 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93900/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21946 **[Test build #93896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93896/testReport)** for PR 21946 at commit [`1f0c9a7`](https://github.com/apache/spark/commit/1f0c9a79dc1dc625ad5c821f87f8cdf6f471f445). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class OpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](` * ` sealed class Hasher[@specialized(Long, Int, Double, Float) T] extends Serializable ` * ` class DoubleHasher extends Hasher[Double] ` * ` class FloatHasher extends Hasher[Float] ` * `case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike` * `case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21948 **[Test build #93898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93898/testReport)** for PR 21948 at commit [`852c6f3`](https://github.com/apache/spark/commit/852c6f332bd8f7264cd9c3aae6325e3c84c80ff5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21949 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21949 **[Test build #93900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93900/testReport)** for PR 21949 at commit [`1f817a0`](https://github.com/apache/spark/commit/1f817a058887090ebf41c510a3b6086a062433d6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21915 **[Test build #93910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93910/testReport)** for PR 21915 at commit [`663b900`](https://github.com/apache/spark/commit/663b90004a962768d3fa718c6e3047e38e325519). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21915 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21915 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1586/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21915 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21305 **[Test build #93909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93909/testReport)** for PR 21305 at commit [`922dc16`](https://github.com/apache/spark/commit/922dc164f6950b78223e2e421643bcce5b72a787). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1585/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21946 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21946 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93891/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21946 **[Test build #93891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93891/testReport)** for PR 21946 at commit [`6cac2b5`](https://github.com/apache/spark/commit/6cac2b54014d6b46913d4b86e77a6955cf0bc1e0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21946 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21305 **[Test build #93908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93908/testReport)** for PR 21305 at commit [`ac7cb13`](https://github.com/apache/spark/commit/ac7cb13644a3f80c1627826513fa525154cf2d00). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1584/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21946 @rdblue This change is pretty isolated. It also LGTM to me. Since you are fine about the change, I am assuming you are not blocking this. I will merge this soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207043490 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer( } } + /** + * Resolves columns of an output table from the data in a logical plan. This rule will: + * + * - Reorder columns when the write is by name + * - Insert safe casts when data types do not match + * - Insert aliases when column names do not match + * - Detect plans that are not compatible with the output table and throw AnalysisException + */ + object ResolveOutputRelation extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case append @ AppendData(table, query, isByName) + if table.resolved && query.resolved && !append.resolved => +val projection = resolveOutputColumns(table.name, table.output, query, isByName) + +if (projection != query) { + append.copy(query = projection) +} else { + append +} +} + +def resolveOutputColumns( +tableName: String, +expected: Seq[Attribute], +query: LogicalPlan, +byName: Boolean): LogicalPlan = { + + if (expected.size < query.output.size) { +throw new AnalysisException( + s"""Cannot write to '$tableName', too many data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) + } + + val errors = new mutable.ArrayBuffer[String]() + val resolved: Seq[NamedExpression] = if (byName) { +expected.flatMap { outAttr => + query.resolveQuoted(outAttr.name, resolver) match { +case Some(inAttr) if inAttr.nullable && !outAttr.nullable => + errors += s"Cannot write nullable values to non-null column '${outAttr.name}'" + None + +case Some(inAttr) if !DataType.canWrite(outAttr.dataType, inAttr.dataType, resolver) => + Some(upcast(inAttr, outAttr)) + +case Some(inAttr) => + Some(inAttr) // matches nullability, datatype, and name + +case _ => + errors += s"Cannot find data for output column '${outAttr.name}'" + None + } +} + + } else { +if (expected.size > query.output.size) { + throw new AnalysisException( +s"""Cannot write to '$tableName', not enough data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) +} + +query.output.zip(expected).flatMap { + case (inAttr, outAttr) if inAttr.nullable && !outAttr.nullable => +errors += s"Cannot write nullable values to non-null column '${outAttr.name}'" +None + + case (inAttr, outAttr) +if !DataType.canWrite(inAttr.dataType, outAttr.dataType, resolver) || --- End diff -- I updated this to use your suggestion: now it always adds the cast. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207043344 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer( } } + /** + * Resolves columns of an output table from the data in a logical plan. This rule will: + * + * - Reorder columns when the write is by name + * - Insert safe casts when data types do not match + * - Insert aliases when column names do not match + * - Detect plans that are not compatible with the output table and throw AnalysisException + */ + object ResolveOutputRelation extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case append @ AppendData(table, query, isByName) + if table.resolved && query.resolved && !append.resolved => +val projection = resolveOutputColumns(table.name, table.output, query, isByName) + +if (projection != query) { + append.copy(query = projection) +} else { + append +} +} + +def resolveOutputColumns( +tableName: String, +expected: Seq[Attribute], +query: LogicalPlan, +byName: Boolean): LogicalPlan = { + + if (expected.size < query.output.size) { +throw new AnalysisException( + s"""Cannot write to '$tableName', too many data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) + } + + val errors = new mutable.ArrayBuffer[String]() + val resolved: Seq[NamedExpression] = if (byName) { +expected.flatMap { outAttr => + query.resolveQuoted(outAttr.name, resolver) match { +case Some(inAttr) if inAttr.nullable && !outAttr.nullable => + errors += s"Cannot write nullable values to non-null column '${outAttr.name}'" + None + +case Some(inAttr) if !DataType.canWrite(outAttr.dataType, inAttr.dataType, resolver) => + Some(upcast(inAttr, outAttr)) + +case Some(inAttr) => + Some(inAttr) // matches nullability, datatype, and name + +case _ => + errors += s"Cannot find data for output column '${outAttr.name}'" + None + } +} + + } else { +if (expected.size > query.output.size) { + throw new AnalysisException( +s"""Cannot write to '$tableName', not enough data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) +} + +query.output.zip(expected).flatMap { --- End diff -- I refactored these into a helper method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r207043203 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2217,6 +2218,100 @@ class Analyzer( } } + /** + * Resolves columns of an output table from the data in a logical plan. This rule will: + * + * - Reorder columns when the write is by name + * - Insert safe casts when data types do not match + * - Insert aliases when column names do not match + * - Detect plans that are not compatible with the output table and throw AnalysisException + */ + object ResolveOutputRelation extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case append @ AppendData(table, query, isByName) + if table.resolved && query.resolved && !append.resolved => +val projection = resolveOutputColumns(table.name, table.output, query, isByName) + +if (projection != query) { + append.copy(query = projection) +} else { + append +} +} + +def resolveOutputColumns( +tableName: String, +expected: Seq[Attribute], +query: LogicalPlan, +byName: Boolean): LogicalPlan = { + + if (expected.size < query.output.size) { +throw new AnalysisException( + s"""Cannot write to '$tableName', too many data columns: + |Table columns: ${expected.map(_.name).mkString(", ")} + |Data columns: ${query.output.map(_.name).mkString(", ")}""".stripMargin) + } + + val errors = new mutable.ArrayBuffer[String]() + val resolved: Seq[NamedExpression] = if (byName) { +expected.flatMap { outAttr => + query.resolveQuoted(outAttr.name, resolver) match { +case Some(inAttr) if inAttr.nullable && !outAttr.nullable => --- End diff -- I fixed this by always failing if `canWrite` returns false and always adding the `UpCast`. Now, `canWrite` will return true if the write type can be cast to the read type for atomic types, as determined by `Cast.canSafeCast`. Since it only returns a boolean, we always insert the cast and the optimizer should remove it if it isn't needed. I also added better error messages. When an error is found, the check will add a clear error message by calling `addError: String => Unit`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21721 **[Test build #93907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93907/testReport)** for PR 21721 at commit [`3e5d9d8`](https://github.com/apache/spark/commit/3e5d9d8ee78176e68b0775a24886a68d021edafa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r207042893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -163,7 +163,8 @@ class SourceProgress protected[sql]( val endOffset: String, val numInputRows: Long, val inputRowsPerSecond: Double, - val processedRowsPerSecond: Double) extends Serializable { + val processedRowsPerSecond: Double, + val customMetrics: Option[JValue] = None) extends Serializable { --- End diff -- Refactored to Json String instead of JValue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @HyukjinKwon , have addressed the comments and modified SourceProgress and SinkProgress to take String instead of JValue so that this can be easily used from Java. Regarding the default value in the ctor, I am not sure if its an issue because the object is mostly read only and would be an issue only if the user tries to construct it from Java. I have added overloaded ctors anyways. Please take a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21469 **[Test build #93906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93906/testReport)** for PR 21469 at commit [`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 Retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21622 Test failure looks unrelated. Jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21915 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93889/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21915 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21915 **[Test build #93889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93889/testReport)** for PR 21915 at commit [`663b900`](https://github.com/apache/spark/commit/663b90004a962768d3fa718c6e3047e38e325519). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93890/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21898 **[Test build #93890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93890/testReport)** for PR 21898 at commit [`94d3671`](https://github.com/apache/spark/commit/94d36719e03cdbda969c97603c7f96552372a07e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r207037118 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -669,6 +686,34 @@ private[spark] class AppStatusListener( } } } + +// check if there is a new peak value for any of the executor level memory metrics +// for the live UI. SparkListenerExecutorMetricsUpdate events are only processed +// for the live UI. +event.executorUpdates.foreach { updates: ExecutorMetrics => + liveExecutors.get(event.execId).foreach { exec: LiveExecutor => +if (exec.peakExecutorMetrics.compareAndUpdatePeakValues(updates)) { + maybeUpdate(exec, now) +} + } +} + } + + override def onStageExecutorMetrics(executorMetrics: SparkListenerStageExecutorMetrics): Unit = { +val now = System.nanoTime() + +// check if there is a new peak value for any of the executor level memory metrics, +// while reading from the log. SparkListenerStageExecutorMetrics are only processed +// when reading logs. +liveExecutors.get(executorMetrics.execId) + .orElse(deadExecutors.get(executorMetrics.execId)) match { + case Some(exec) => --- End diff -- yeah, but you're talking about both a `foreach` *and* an `if` together. A long time back we discussed using `option.fold` for this, as it is all in one function, but we rejected it as being pretty confusing for most developers. ```scala scala> def foo(x: Option[String]) = x.fold("nada")("some " + _) foo: (x: Option[String])String scala> foo(None) res0: String = nada scala> foo(Some("blah")) res1: String = some blah ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #93905 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93905/testReport)** for PR 21608 at commit [`ae64b9d`](https://github.com/apache/spark/commit/ae64b9d7a89055a5156f57b1b09fcb9d0b4ee38a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 @gatorsmile, I have addressed the comments. Any other fix required? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21923: [SPARK-24918][Core] Executor Plugin api
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21923#discussion_r207036143 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -130,6 +130,12 @@ private[spark] class Executor( private val urlClassLoader = createClassLoader() private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader) + Thread.currentThread().setContextClassLoader(replClassLoader) --- End diff -- My memory monitor would be fine if the constructor were called in another thread. (It actually creates its own thread -- it has to, as its going to continually poll.) What would be the advantage to calling the constructor in a separate thread? If its just protect against exceptions, we could just do a try/catch. If its to ensure that we don't tie up the main executor threads ... well, even in another thread, the plugin could do something arbitrary to tie up all the resources associated with this executor (eg. launch 30 threads and do something intensive in each one). Not opposed to having another thread, just want to understand why. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21921 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93887/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21921 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21946 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21921 **[Test build #93887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93887/testReport)** for PR 21921 at commit [`d6a93b1`](https://github.com/apache/spark/commit/d6a93b162b4ced2c9ab33715cfbe8d196e6140e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class RateStreamContinuousReader(options: DataSourceOptions) extends ContinuousReader ` * `class TextSocketMicroBatchReader(options: DataSourceOptions) extends MicroBatchReader with Logging ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21946 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93888/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r207035320 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -78,7 +93,8 @@ object CommandUtils extends Logging { val size = if (fileStatus.isDirectory) { fs.listStatus(path) .map { status => -if (!status.getPath.getName.startsWith(stagingDir)) { +if (!status.getPath.getName.startsWith(stagingDir) && + DataSourceUtils.isDataPath(path)) { --- End diff -- Added a line to migration doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21946 **[Test build #93888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93888/testReport)** for PR 21946 at commit [`19808d5`](https://github.com/apache/spark/commit/19808d500a869114d84383f23056483316e52a33). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r207035227 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1449,6 +1449,13 @@ object SQLConf { .intConf .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION) .createWithDefault(Deflater.DEFAULT_COMPRESSION) + + val COMPUTE_STATS_LIST_FILES_IN_PARALLEL = +buildConf("spark.sql.execution.computeStatsListFilesInParallel") + .internal() + .doc("If True, File listing for compute statistics is done in parallel.") --- End diff -- Thanks! I have made this change, --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...
Github user Achuth17 commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r207035140 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1449,6 +1449,13 @@ object SQLConf { .intConf .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION) .createWithDefault(Deflater.DEFAULT_COMPRESSION) + + val COMPUTE_STATS_LIST_FILES_IN_PARALLEL = +buildConf("spark.sql.execution.computeStatsListFilesInParallel") --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21733 @tdas I've applied your review comments except documentation. (Will add WIP to the PR's title if it sounds clearer) There may be something you can add the review comments and so I'd like to work on documentation when the patch is in a shape to "ready to merge". Otherwise I'll try to find time/resource and run the performance tests again, but it might take couple of days or more to get it. Will update once I run and get new numbers. During the wait please continuous reviewing the code. It would help running the tests with latest updated patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21884 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/1583/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21884 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1583/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21884 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21892 Great! Let us wait for 2.7.3 build? @jbax When will it be released? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21947 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21947 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93902/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21947 **[Test build #93902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93902/testReport)** for PR 21947 at commit [`a59672b`](https://github.com/apache/spark/commit/a59672b6dd9ce095f96446048fe7059a3f0711ae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r207032024 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +57,14 @@ class FailureSafeParser[IN]( } } + private val skipParsing = optimizeEmptySchema && schema.isEmpty def parse(input: IN): Iterator[InternalRow] = { try { - rawParser.apply(input).toIterator.map(row => toResultRow(Some(row), () => null)) + if (skipParsing) { + Iterator.single(InternalRow.empty) + } else { + rawParser.apply(input).toIterator.map(row => toResultRow(Some(row), () => null)) --- End diff -- both? If we introduce a behavior change, we need to document it in the migration guide and add a conf. Users can do the conf to revert back to the previous behaviors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19449 LGTM pending Jenkins. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17185 **[Test build #93904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93904/testReport)** for PR 17185 at commit [`05d6c0f`](https://github.com/apache/spark/commit/05d6c0fc6139f8672aea3173da3d98c5cc4e1a29). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21948 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21884 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/1583/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21892 @jbax It became really faster: ``` Parsing quoted values: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative One quoted string 33411 / 33510 0.0 668211.4 1.0X Wide rows with 1000 columns: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Select 1000 columns 88028 / 89311 0.0 88028.1 1.0X Select 100 columns 29010 / 32755 0.0 29010.1 3.0X Select one column 22936 / 22953 0.0 22936.5 3.8X count() 22790 / 23143 0.0 22789.6 3.9X ``` The `count()` benchmark is still slower because I reverted the optimization for empty schema. Before we didn't call `uniVocity`'s `parseLine` if the set of selected indexes is empty. In this PR, I call `parseLine` for empty set since the bug (https://github.com/uniVocity/univocity-parsers/issues/250) has been fixed. It seems it performs similar to the case when only one column is selected. So, the overhead per line is around 15.5 milliseconds on my CPU. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...
Github user skambha commented on the issue: https://github.com/apache/spark/pull/17185 I rebased and found out that the resolution code in Logical plan has changed and it uses map lookup to do the matching. I have some ideas on how to incorporate the 3 part name with the map lookup logic. For now, I have synced up and bypassed the new map logic and pushed the code so it is up to the latest so I can get the test cycle from GitHub and added todoâs for myself to incorporate the optimized lookup logic. @cloud-fan, In the meantime, if you have any other comments please let me know. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21884: [SPARK-24960][K8S] explicitly expose ports on dri...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21884 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21883: [SPARK-24937][SQL] Datasource partition table sho...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21883 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19449 **[Test build #4229 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4229/testReport)** for PR 19449 at commit [`253bc19`](https://github.com/apache/spark/commit/253bc19af270185e6d419a9ed0261917f84688c1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21884 Thanks @adelbertc for the contribution! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21733 **[Test build #93903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93903/testReport)** for PR 21733 at commit [`b4a3807`](https://github.com/apache/spark/commit/b4a3807631cc8e12df367eeca554749fdd81a5ef). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21883 LGTM Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: [SPARK-18057][SS] Update Kafka client version from 0.10....
Github user ijuma commented on the issue: https://github.com/apache/spark/pull/21488 @wangyum, can you please file a Kafka JIRA with details of what the test is doing (even if the failure is transient)? From the stacktrace, it looks like a potential broker issue (assuming there are no real disk issues where these tests were executed). If there is indeed a new issue (we have to verify since the test seems to be transient), it would likely only affect tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21884 Thanks @shaneknapp, I'll merge this now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19449 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19449 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93895/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19449 **[Test build #93895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93895/testReport)** for PR 19449 at commit [`253bc19`](https://github.com/apache/spark/commit/253bc19af270185e6d419a9ed0261917f84688c1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21947 **[Test build #93902 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93902/testReport)** for PR 21947 at commit [`a59672b`](https://github.com/apache/spark/commit/a59672b6dd9ce095f96446048fe7059a3f0711ae). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21947 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1582/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21947 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...
Github user skambha commented on a diff in the pull request: https://github.com/apache/spark/pull/17185#discussion_r207027392 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala --- @@ -121,14 +129,14 @@ abstract class Attribute extends LeafExpression with NamedExpression with NullIn * @param name The name to be associated with the result of computing [[child]]. * @param exprId A globally unique id used to check if an [[AttributeReference]] refers to this * alias. Auto-assigned if left blank. - * @param qualifier An optional string that can be used to referred to this attribute in a fully - * qualified way. Consider the examples tableName.name, subQueryAlias.name. - * tableName and subQueryAlias are possible qualifiers. + * @param qualifier An optional Seq of string that can be used to refer to this attribute in a + * fully qualified way. Consider the examples tableName.name, subQueryAlias.name. + * tableName and subQueryAlias are possible qualifiers. * @param explicitMetadata Explicit metadata associated with this alias that overwrites child's. */ case class Alias(child: Expression, name: String)( val exprId: ExprId = NamedExpression.newExprId, -val qualifier: Option[String] = None, +val qualifier: Option[Seq[String]] = None, --- End diff -- I'll look into this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21884: [SPARK-24960][K8S] explicitly expose ports on dri...
Github user adelbertc commented on a diff in the pull request: https://github.com/apache/spark/pull/21884#discussion_r207026446 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala --- @@ -203,4 +212,12 @@ class BasicDriverFeatureStepSuite extends SparkFunSuite { "spark.files" -> "https://localhost:9000/file1.txt,/opt/spark/file2.txt;) assert(additionalProperties === expectedSparkConf) } + + def containerPort(name: String, portNumber: Int): ContainerPort = { +val port = new ContainerPort() --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21884 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93901/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21884 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21884 **[Test build #93901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93901/testReport)** for PR 21884 at commit [`4215fc2`](https://github.com/apache/spark/commit/4215fc26c49a71b8366e198a04afc32182e2fb2c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21884 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/1581/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org