[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93899/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21935
  
**[Test build #93899 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93899/testReport)**
 for PR 21935 at commit 
[`09ad6e9`](https://github.com/apache/spark/commit/09ad6e9f022740182312b29e20d5ff52778f63ed).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-01 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
@holdenk Did we miss the window?  I still count 10 undefined names in this 
repo.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21950
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21950: [SPARK-24912][SQL][WIP] Add configuration to avoid OOM d...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21950
  
**[Test build #93912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93912/testReport)**
 for PR 21950 at commit 
[`aa2a957`](https://github.com/apache/spark/commit/aa2a957751a906fe538822cace019014e763a8c3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21950: [SPARK-24912][SQL][WIP] Add configuration to avoi...

2018-08-01 Thread bersprockets
GitHub user bersprockets opened a pull request:

https://github.com/apache/spark/pull/21950

[SPARK-24912][SQL][WIP] Add configuration to avoid OOM during broadcast 
join (and other negative side effects of incorrect table sizing)

## What changes were proposed in this pull request?

Added configuration settings to help avoid OOM errors during broadcast 
joins.

- deser multiplication factor: Tell Spark to multiply totalSize times a 
specified factor for tables with encoded files (i.e., parquet or orc files). 
Spark will do this when calculating a table's sizeInBytes. This is modelled 
after Hive's hive.stats.deserialization.factor configuration setting.
- ignore rawDataSize: Due to HIVE-20079, rawDataSize is broken. This 
settings tells Spark to ignore rawDataSize when calculating the table's 
sizeInBytes.

One can partially simulate the deser multiplication factor without this 
change by decreasing the value in spark.sql.autoBroadcastJoinThreshold. 
However, that will affect all tables, not just the ones that are encoded.

There is some awkwardness in that the check for file type (parquet or orc) 
uses Hive deser names, but the checks for partitioned tables need to be made 
outside of the Hive submodule. Still working that out.

## How was this patch tested?

Added unit tests.

Also, checked that I can avoid broadcast join OOM errors when using the 
deser multiplication factor on both my laptop and a cluster. Also checked that 
I can avoid OOM errors using the ignore rawDataSize flag on my laptop.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bersprockets/spark SPARK-24914

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21950.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21950


commit aa2a957751a906fe538822cace019014e763a8c3
Author: Bruce Robbins 
Date:   2018-07-26T00:36:17Z

WIP version




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21950: [SPARK-24912][SQL][WIP] Add configuration to avoid OOM d...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21950
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21950: [SPARK-24912][SQL][WIP] Add configuration to avoid OOM d...

2018-08-01 Thread holdensmagicalunicorn
Github user holdensmagicalunicorn commented on the issue:

https://github.com/apache/spark/pull/21950
  
@bersprockets, thanks! I am a bot who has found some folks who might be 
able to help with the review:@cloud-fan, @gatorsmile and @rxin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21898
  
**[Test build #93911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93911/testReport)**
 for PR 21898 at commit 
[`94d3671`](https://github.com/apache/spark/commit/94d36719e03cdbda969c97603c7f96552372a07e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1587/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21898
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21946
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21946
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93896/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21948
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21948
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93898/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21949
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93900/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21946
  
**[Test build #93896 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93896/testReport)**
 for PR 21946 at commit 
[`1f0c9a7`](https://github.com/apache/spark/commit/1f0c9a79dc1dc625ad5c821f87f8cdf6f471f445).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](`
  * `  sealed class Hasher[@specialized(Long, Int, Double, Float) T] 
extends Serializable `
  * `  class DoubleHasher extends Hasher[Double] `
  * `  class FloatHasher extends Hasher[Float] `
  * `case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetLike`
  * `case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21948
  
**[Test build #93898 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93898/testReport)**
 for PR 21948 at commit 
[`852c6f3`](https://github.com/apache/spark/commit/852c6f332bd8f7264cd9c3aae6325e3c84c80ff5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21949
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21949: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal fo...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21949
  
**[Test build #93900 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93900/testReport)**
 for PR 21949 at commit 
[`1f817a0`](https://github.com/apache/spark/commit/1f817a058887090ebf41c510a3b6086a062433d6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21915
  
**[Test build #93910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93910/testReport)**
 for PR 21915 at commit 
[`663b900`](https://github.com/apache/spark/commit/663b90004a962768d3fa718c6e3047e38e325519).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21915
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21915
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1586/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21915
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21305
  
**[Test build #93909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93909/testReport)**
 for PR 21305 at commit 
[`922dc16`](https://github.com/apache/spark/commit/922dc164f6950b78223e2e421643bcce5b72a787).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21305
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1585/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21946
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21946
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93891/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21946
  
**[Test build #93891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93891/testReport)**
 for PR 21946 at commit 
[`6cac2b5`](https://github.com/apache/spark/commit/6cac2b54014d6b46913d4b86e77a6955cf0bc1e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/21946
  
+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21305
  
**[Test build #93908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93908/testReport)**
 for PR 21305 at commit 
[`ac7cb13`](https://github.com/apache/spark/commit/ac7cb13644a3f80c1627826513fa525154cf2d00).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21305
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1584/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21946
  
@rdblue This change is pretty isolated. It also LGTM to me. 

Since you are fine about the change, I am assuming you are not blocking 
this. I will merge this soon. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21305#discussion_r207043490
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2217,6 +2218,100 @@ class Analyzer(
 }
   }
 
+  /**
+   * Resolves columns of an output table from the data in a logical plan. 
This rule will:
+   *
+   * - Reorder columns when the write is by name
+   * - Insert safe casts when data types do not match
+   * - Insert aliases when column names do not match
+   * - Detect plans that are not compatible with the output table and 
throw AnalysisException
+   */
+  object ResolveOutputRelation extends Rule[LogicalPlan] {
+override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case append @ AppendData(table, query, isByName)
+  if table.resolved && query.resolved && !append.resolved =>
+val projection = resolveOutputColumns(table.name, table.output, 
query, isByName)
+
+if (projection != query) {
+  append.copy(query = projection)
+} else {
+  append
+}
+}
+
+def resolveOutputColumns(
+tableName: String,
+expected: Seq[Attribute],
+query: LogicalPlan,
+byName: Boolean): LogicalPlan = {
+
+  if (expected.size < query.output.size) {
+throw new AnalysisException(
+  s"""Cannot write to '$tableName', too many data columns:
+ |Table columns: ${expected.map(_.name).mkString(", ")}
+ |Data columns: ${query.output.map(_.name).mkString(", 
")}""".stripMargin)
+  }
+
+  val errors = new mutable.ArrayBuffer[String]()
+  val resolved: Seq[NamedExpression] = if (byName) {
+expected.flatMap { outAttr =>
+  query.resolveQuoted(outAttr.name, resolver) match {
+case Some(inAttr) if inAttr.nullable && !outAttr.nullable =>
+  errors += s"Cannot write nullable values to non-null column 
'${outAttr.name}'"
+  None
+
+case Some(inAttr) if !DataType.canWrite(outAttr.dataType, 
inAttr.dataType, resolver) =>
+  Some(upcast(inAttr, outAttr))
+
+case Some(inAttr) =>
+  Some(inAttr) // matches nullability, datatype, and name
+
+case _ =>
+  errors += s"Cannot find data for output column 
'${outAttr.name}'"
+  None
+  }
+}
+
+  } else {
+if (expected.size > query.output.size) {
+  throw new AnalysisException(
+s"""Cannot write to '$tableName', not enough data columns:
+   |Table columns: ${expected.map(_.name).mkString(", ")}
+   |Data columns: ${query.output.map(_.name).mkString(", 
")}""".stripMargin)
+}
+
+query.output.zip(expected).flatMap {
+  case (inAttr, outAttr) if inAttr.nullable && !outAttr.nullable =>
+errors += s"Cannot write nullable values to non-null column 
'${outAttr.name}'"
+None
+
+  case (inAttr, outAttr)
+if !DataType.canWrite(inAttr.dataType, outAttr.dataType, 
resolver) ||
--- End diff --

I updated this to use your suggestion: now it always adds the cast.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21305#discussion_r207043344
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2217,6 +2218,100 @@ class Analyzer(
 }
   }
 
+  /**
+   * Resolves columns of an output table from the data in a logical plan. 
This rule will:
+   *
+   * - Reorder columns when the write is by name
+   * - Insert safe casts when data types do not match
+   * - Insert aliases when column names do not match
+   * - Detect plans that are not compatible with the output table and 
throw AnalysisException
+   */
+  object ResolveOutputRelation extends Rule[LogicalPlan] {
+override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case append @ AppendData(table, query, isByName)
+  if table.resolved && query.resolved && !append.resolved =>
+val projection = resolveOutputColumns(table.name, table.output, 
query, isByName)
+
+if (projection != query) {
+  append.copy(query = projection)
+} else {
+  append
+}
+}
+
+def resolveOutputColumns(
+tableName: String,
+expected: Seq[Attribute],
+query: LogicalPlan,
+byName: Boolean): LogicalPlan = {
+
+  if (expected.size < query.output.size) {
+throw new AnalysisException(
+  s"""Cannot write to '$tableName', too many data columns:
+ |Table columns: ${expected.map(_.name).mkString(", ")}
+ |Data columns: ${query.output.map(_.name).mkString(", 
")}""".stripMargin)
+  }
+
+  val errors = new mutable.ArrayBuffer[String]()
+  val resolved: Seq[NamedExpression] = if (byName) {
+expected.flatMap { outAttr =>
+  query.resolveQuoted(outAttr.name, resolver) match {
+case Some(inAttr) if inAttr.nullable && !outAttr.nullable =>
+  errors += s"Cannot write nullable values to non-null column 
'${outAttr.name}'"
+  None
+
+case Some(inAttr) if !DataType.canWrite(outAttr.dataType, 
inAttr.dataType, resolver) =>
+  Some(upcast(inAttr, outAttr))
+
+case Some(inAttr) =>
+  Some(inAttr) // matches nullability, datatype, and name
+
+case _ =>
+  errors += s"Cannot find data for output column 
'${outAttr.name}'"
+  None
+  }
+}
+
+  } else {
+if (expected.size > query.output.size) {
+  throw new AnalysisException(
+s"""Cannot write to '$tableName', not enough data columns:
+   |Table columns: ${expected.map(_.name).mkString(", ")}
+   |Data columns: ${query.output.map(_.name).mkString(", 
")}""".stripMargin)
+}
+
+query.output.zip(expected).flatMap {
--- End diff --

I refactored these into a helper method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-01 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21305#discussion_r207043203
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2217,6 +2218,100 @@ class Analyzer(
 }
   }
 
+  /**
+   * Resolves columns of an output table from the data in a logical plan. 
This rule will:
+   *
+   * - Reorder columns when the write is by name
+   * - Insert safe casts when data types do not match
+   * - Insert aliases when column names do not match
+   * - Detect plans that are not compatible with the output table and 
throw AnalysisException
+   */
+  object ResolveOutputRelation extends Rule[LogicalPlan] {
+override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case append @ AppendData(table, query, isByName)
+  if table.resolved && query.resolved && !append.resolved =>
+val projection = resolveOutputColumns(table.name, table.output, 
query, isByName)
+
+if (projection != query) {
+  append.copy(query = projection)
+} else {
+  append
+}
+}
+
+def resolveOutputColumns(
+tableName: String,
+expected: Seq[Attribute],
+query: LogicalPlan,
+byName: Boolean): LogicalPlan = {
+
+  if (expected.size < query.output.size) {
+throw new AnalysisException(
+  s"""Cannot write to '$tableName', too many data columns:
+ |Table columns: ${expected.map(_.name).mkString(", ")}
+ |Data columns: ${query.output.map(_.name).mkString(", 
")}""".stripMargin)
+  }
+
+  val errors = new mutable.ArrayBuffer[String]()
+  val resolved: Seq[NamedExpression] = if (byName) {
+expected.flatMap { outAttr =>
+  query.resolveQuoted(outAttr.name, resolver) match {
+case Some(inAttr) if inAttr.nullable && !outAttr.nullable =>
--- End diff --

I fixed this by always failing if `canWrite` returns false and always 
adding the `UpCast`.

Now, `canWrite` will return true if the write type can be cast to the read 
type for atomic types, as determined by `Cast.canSafeCast`. Since it only 
returns a boolean, we always insert the cast and the optimizer should remove it 
if it isn't needed.

I also added better error messages. When an error is found, the check will 
add a clear error message by calling `addError: String => Unit`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21721
  
**[Test build #93907 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93907/testReport)**
 for PR 21721 at commit 
[`3e5d9d8`](https://github.com/apache/spark/commit/3e5d9d8ee78176e68b0775a24886a68d021edafa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-08-01 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21721#discussion_r207042893
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -163,7 +163,8 @@ class SourceProgress protected[sql](
   val endOffset: String,
   val numInputRows: Long,
   val inputRowsPerSecond: Double,
-  val processedRowsPerSecond: Double) extends Serializable {
+  val processedRowsPerSecond: Double,
+  val customMetrics: Option[JValue] = None) extends Serializable {
--- End diff --

Refactored to Json String instead of JValue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-01 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21721
  
@HyukjinKwon , have addressed the comments and modified SourceProgress and 
SinkProgress to take String instead of JValue so that this can be easily used 
from Java. Regarding the default value in the ctor, I am not sure if its an 
issue because the object is mostly read only and would be an issue only if the 
user tries to construct it from Java. I have added overloaded ctors anyways. 
Please take a look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #93906 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93906/testReport)**
 for PR 21469 at commit 
[`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-08-01 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21469
  
Retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-08-01 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21622
  
Test failure looks unrelated.

Jenkins, retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21915
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93889/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21915
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21915
  
**[Test build #93889 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93889/testReport)**
 for PR 21915 at commit 
[`663b900`](https://github.com/apache/spark/commit/663b90004a962768d3fa718c6e3047e38e325519).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93890/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21898
  
**[Test build #93890 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93890/testReport)**
 for PR 21898 at commit 
[`94d3671`](https://github.com/apache/spark/commit/94d36719e03cdbda969c97603c7f96552372a07e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r207037118
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -669,6 +686,34 @@ private[spark] class AppStatusListener(
 }
   }
 }
+
+// check if there is a new peak value for any of the executor level 
memory metrics
+// for the live UI. SparkListenerExecutorMetricsUpdate events are only 
processed
+// for the live UI.
+event.executorUpdates.foreach { updates: ExecutorMetrics =>
+  liveExecutors.get(event.execId).foreach { exec: LiveExecutor =>
+if (exec.peakExecutorMetrics.compareAndUpdatePeakValues(updates)) {
+  maybeUpdate(exec, now)
+}
+  }
+}
+  }
+
+  override def onStageExecutorMetrics(executorMetrics: 
SparkListenerStageExecutorMetrics): Unit = {
+val now = System.nanoTime()
+
+// check if there is a new peak value for any of the executor level 
memory metrics,
+// while reading from the log. SparkListenerStageExecutorMetrics are 
only processed
+// when reading logs.
+liveExecutors.get(executorMetrics.execId)
+  .orElse(deadExecutors.get(executorMetrics.execId)) match {
+  case Some(exec) =>
--- End diff --

yeah, but you're talking about both a `foreach` *and* an `if` together.

A long time back we discussed using `option.fold` for this, as it is all in 
one function, but we rejected it as being pretty confusing for most developers.

```scala
scala> def foo(x: Option[String]) = x.fold("nada")("some " + _)
foo: (x: Option[String])String

scala> foo(None)
res0: String = nada

scala> foo(Some("blah"))
res1: String = some blah
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21608
  
**[Test build #93905 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93905/testReport)**
 for PR 21608 at commit 
[`ae64b9d`](https://github.com/apache/spark/commit/ae64b9d7a89055a5156f57b1b09fcb9d0b4ee38a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on the issue:

https://github.com/apache/spark/pull/21608
  
@gatorsmile, I have addressed the comments. Any other fix required?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21923#discussion_r207036143
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -130,6 +130,12 @@ private[spark] class Executor(
   private val urlClassLoader = createClassLoader()
   private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader)
 
+  Thread.currentThread().setContextClassLoader(replClassLoader)
--- End diff --

My memory monitor would be fine if the constructor were called in another 
thread.  (It actually creates its own thread -- it has to, as its going to 
continually poll.)

What would be the advantage to calling the constructor in a separate 
thread?  If its just protect against exceptions, we could just do a try/catch.  
If its to ensure that we don't tie up the main executor threads ... well, even 
in another thread, the plugin could do something arbitrary to tie up all the 
resources associated with this executor (eg. launch 30 threads and do something 
intensive in each one).

Not opposed to having another thread, just want to understand why.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21921
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93887/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21921
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21946
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21921
  
**[Test build #93887 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93887/testReport)**
 for PR 21921 at commit 
[`d6a93b1`](https://github.com/apache/spark/commit/d6a93b162b4ced2c9ab33715cfbe8d196e6140e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class RateStreamContinuousReader(options: DataSourceOptions) extends 
ContinuousReader `
  * `class TextSocketMicroBatchReader(options: DataSourceOptions) extends 
MicroBatchReader with Logging `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21946
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93888/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21608#discussion_r207035320
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala
 ---
@@ -78,7 +93,8 @@ object CommandUtils extends Logging {
   val size = if (fileStatus.isDirectory) {
 fs.listStatus(path)
   .map { status =>
-if (!status.getPath.getName.startsWith(stagingDir)) {
+if (!status.getPath.getName.startsWith(stagingDir) &&
+  DataSourceUtils.isDataPath(path)) {
--- End diff --

Added a line to migration doc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21946
  
**[Test build #93888 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93888/testReport)**
 for PR 21946 at commit 
[`19808d5`](https://github.com/apache/spark/commit/19808d500a869114d84383f23056483316e52a33).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21608#discussion_r207035227
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1449,6 +1449,13 @@ object SQLConf {
 .intConf
 .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION)
 .createWithDefault(Deflater.DEFAULT_COMPRESSION)
+
+  val COMPUTE_STATS_LIST_FILES_IN_PARALLEL =
+buildConf("spark.sql.execution.computeStatsListFilesInParallel")
+  .internal()
+  .doc("If True, File listing for compute statistics is done in 
parallel.")
--- End diff --

Thanks! I have made this change,


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-08-01 Thread Achuth17
Github user Achuth17 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21608#discussion_r207035140
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1449,6 +1449,13 @@ object SQLConf {
 .intConf
 .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION)
 .createWithDefault(Deflater.DEFAULT_COMPRESSION)
+
+  val COMPUTE_STATS_LIST_FILES_IN_PARALLEL =
+buildConf("spark.sql.execution.computeStatsListFilesInParallel")
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-08-01 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21733
  
@tdas 
I've applied your review comments except documentation. (Will add WIP to 
the PR's title if it sounds clearer) There may be something you can add the 
review comments and so I'd like to work on documentation when the patch is in a 
shape to "ready to merge".

Otherwise I'll try to find time/resource and run the performance tests 
again, but it might take couple of days or more to get it. Will update once I 
run and get new numbers. During the wait please continuous reviewing the code. 
It would help running the tests with latest updated patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21884
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/1583/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21884
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1583/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21884
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21892
  
Great! Let us wait for 2.7.3 build? @jbax When will it be released?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21947
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21947
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93902/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21947
  
**[Test build #93902 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93902/testReport)**
 for PR 21947 at commit 
[`a59672b`](https://github.com/apache/spark/commit/a59672b6dd9ce095f96446048fe7059a3f0711ae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21909#discussion_r207032024
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
 ---
@@ -56,9 +57,14 @@ class FailureSafeParser[IN](
 }
   }
 
+  private val skipParsing = optimizeEmptySchema && schema.isEmpty
   def parse(input: IN): Iterator[InternalRow] = {
 try {
-  rawParser.apply(input).toIterator.map(row => toResultRow(Some(row), 
() => null))
+ if (skipParsing) {
+   Iterator.single(InternalRow.empty)
+ } else {
+   rawParser.apply(input).toIterator.map(row => toResultRow(Some(row), 
() => null))
--- End diff --

both? 

If we introduce a behavior change, we need to document it in the migration 
guide and add a conf. Users can do the conf to revert back to the previous 
behaviors. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19449
  
LGTM pending Jenkins.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17185
  
**[Test build #93904 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93904/testReport)**
 for PR 17185 at commit 
[`05d6c0f`](https://github.com/apache/spark/commit/05d6c0fc6139f8672aea3173da3d98c5cc4e1a29).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter

2018-08-01 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/21948
  
lgtm


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21884
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/1583/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-01 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21892
  
@jbax It became really faster:
```
Parsing quoted values:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


One quoted string   33411 / 33510  0.0  
668211.4   1.0X

Wide rows with 1000 columns: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


Select 1000 columns 88028 / 89311  0.0  
 88028.1   1.0X
Select 100 columns  29010 / 32755  0.0  
 29010.1   3.0X
Select one column   22936 / 22953  0.0  
 22936.5   3.8X
count() 22790 / 23143  0.0  
 22789.6   3.9X
```
The `count()` benchmark is still slower because I reverted the optimization 
for empty schema. Before we didn't call `uniVocity`'s `parseLine` if the set of 
selected indexes is empty. In this PR, I call `parseLine` for empty set since 
the bug (https://github.com/uniVocity/univocity-parsers/issues/250) has been 
fixed. It seems it performs similar to the case when only one column is 
selected. So, the overhead per line is around 15.5 milliseconds on my CPU.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-08-01 Thread skambha
Github user skambha commented on the issue:

https://github.com/apache/spark/pull/17185
  
I rebased and found out that the resolution code in Logical plan has 
changed and it uses map lookup to do the matching. I have some ideas on how to 
incorporate the 3 part name with the map lookup logic.  

For now,  I have synced up and bypassed the new map logic and pushed the 
code so it is up to the latest so I can get the test cycle from GitHub and 
added todo’s for myself to incorporate the optimized lookup logic. 

@cloud-fan,  In the meantime, if you have any other comments please let me 
know.  Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21884: [SPARK-24960][K8S] explicitly expose ports on dri...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21884


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21883: [SPARK-24937][SQL] Datasource partition table sho...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21883


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19449
  
**[Test build #4229 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4229/testReport)**
 for PR 19449 at commit 
[`253bc19`](https://github.com/apache/spark/commit/253bc19af270185e6d419a9ed0261917f84688c1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...

2018-08-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/21884
  
Thanks @adelbertc for the contribution!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21733
  
**[Test build #93903 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93903/testReport)**
 for PR 21733 at commit 
[`b4a3807`](https://github.com/apache/spark/commit/b4a3807631cc8e12df367eeca554749fdd81a5ef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21883: [SPARK-24937][SQL] Datasource partition table should loa...

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21883
  
LGTM

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21488: [SPARK-18057][SS] Update Kafka client version from 0.10....

2018-08-01 Thread ijuma
Github user ijuma commented on the issue:

https://github.com/apache/spark/pull/21488
  
@wangyum, can you please file a Kafka JIRA with details of what the test is 
doing (even if the failure is transient)? From the stacktrace, it looks like a 
potential broker issue (assuming there are no real disk issues where these 
tests were executed). If there is indeed a new issue (we have to verify since 
the test seems to be transient), it would likely only affect tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...

2018-08-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/21884
  
Thanks @shaneknapp, I'll merge this now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19449
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19449
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93895/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19449
  
**[Test build #93895 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93895/testReport)**
 for PR 19449 at commit 
[`253bc19`](https://github.com/apache/spark/commit/253bc19af270185e6d419a9ed0261917f84688c1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21947
  
**[Test build #93902 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93902/testReport)**
 for PR 21947 at commit 
[`a59672b`](https://github.com/apache/spark/commit/a59672b6dd9ce095f96446048fe7059a3f0711ae).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21947
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1582/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21947: [MINOR][DOCS] Add note about Spark network security

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21947
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-01 Thread skambha
Github user skambha commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r207027392
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 ---
@@ -121,14 +129,14 @@ abstract class Attribute extends LeafExpression with 
NamedExpression with NullIn
  * @param name The name to be associated with the result of computing 
[[child]].
  * @param exprId A globally unique id used to check if an 
[[AttributeReference]] refers to this
  *   alias. Auto-assigned if left blank.
- * @param qualifier An optional string that can be used to referred to 
this attribute in a fully
- *   qualified way. Consider the examples tableName.name, 
subQueryAlias.name.
- *   tableName and subQueryAlias are possible qualifiers.
+ * @param qualifier An optional Seq of string that can be used to refer to 
this attribute in a
+ *  fully qualified way. Consider the examples 
tableName.name, subQueryAlias.name.
+ *  tableName and subQueryAlias are possible qualifiers.
  * @param explicitMetadata Explicit metadata associated with this alias 
that overwrites child's.
  */
 case class Alias(child: Expression, name: String)(
 val exprId: ExprId = NamedExpression.newExprId,
-val qualifier: Option[String] = None,
+val qualifier: Option[Seq[String]] = None,
--- End diff --

I'll look into this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21884: [SPARK-24960][K8S] explicitly expose ports on dri...

2018-08-01 Thread adelbertc
Github user adelbertc commented on a diff in the pull request:

https://github.com/apache/spark/pull/21884#discussion_r207026446
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala
 ---
@@ -203,4 +212,12 @@ class BasicDriverFeatureStepSuite extends 
SparkFunSuite {
   "spark.files" -> 
"https://localhost:9000/file1.txt,/opt/spark/file2.txt;)
 assert(additionalProperties === expectedSparkConf)
   }
+
+  def containerPort(name: String, portNumber: Int): ContainerPort = {
+val port = new ContainerPort()
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21884
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93901/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21884
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21884
  
**[Test build #93901 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93901/testReport)**
 for PR 21884 at commit 
[`4215fc2`](https://github.com/apache/spark/commit/4215fc26c49a71b8366e198a04afc32182e2fb2c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21884: [SPARK-24960][K8S] explicitly expose ports on driver con...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21884
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/1581/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   >