date:20180322

[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-03-22 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20657
  
Thanks, merging to master branch!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20887
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20887
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88535/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20887
  
**[Test build #88535 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88535/testReport)**
 for PR 20887 at commit 
[`c6d12e8`](https://github.com/apache/spark/commit/c6d12e8fbed4478cf44787aefd753c49940ebfab).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi

Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20888
  
Uploaded logs for the jira. You're right when you pointed out a second 
issue with the `stageToKill`. The `onJobStart` tries to cancel the same ID 
twice.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi

Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20888
  
I mean on my machine the stage ID is zero for long long time here:
```
DataFrameRangeSuite.stageToKill = TaskContext.get().stageId()
```
and after 200 seconds the other thread still stands on this (increased the 
timeout to play):
```
eventually(timeout(300.seconds), interval(1.millis)) {
  assert(DataFrameRangeSuite.stageToKill != 
DataFrameRangeSuite.INVALID_STAGE_ID)
}
```
and `cancelStage` never called. Based on that the first non-invalid stage 
ID on my machine is 0. The actual code miss that edge case.

When I start the whole suite then other tests are pre-initializing 
something and the ID is not 0 anymore.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-03-22 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20633
  
@jkbradley Thanks for your comments! I've addressed them. Please review it 
again. Thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19840
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19840
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1718/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange when cac...

2018-03-22 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20831
  
Thanks! @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...

2018-03-22 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20861
  
@hvanhovell Ok. But this needs #20817. Since #20817 just adds new class and 
doesn't change existing code, I think it can be directly merged into 2.3. 
Should I create a backport PR of it too? Or you can direct backport it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-22 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176636849
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

an internal by-default-false conf usually means it's not available for 
users...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange w...

2018-03-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20831


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20831: [SPARK-23614][SQL] Fix incorrect reuse exchange when cac...

2018-03-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20831
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20878: [MINOR][PYTHON] Remove unused codes in schema par...

2018-03-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20878


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20878: [MINOR][PYTHON] Remove unused codes in schema parsing lo...

2018-03-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20878
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20887
  
Hi, @gatorsmile .
Could you review this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20888
  
> if I execute the test on my machine alone it never pass.

you mean it never fails on your machine, right?  its only flaky when you 
run everything on jenkins?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20888
  
hmm you're right, I was looking at a different branch in my editor and 
didn't pay attention that it was reset in the code I linked to on master, oops.

I still dont' understand your proposed solution, though -- how is checking 
`stageToKill != -1` better than checking `stageToKill > 0` in this case?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20208
  
Yes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20868: [SPARK-23750][SQL] Inner Join Elimination based on Infor...

2018-03-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20868
  
cc @hvanhovell @cloud-fan @rxin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176633578
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,12 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
--- End diff --

Yes, it should be an internal conf. In Spark 3.0 release, we will revisit 
all the internal confs and remove all the unnecessary confs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...

2018-03-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20541
  
I don't agree. `a + 1`/`a + b` are evaluated the same number of time, no 
matter you push in through Union or not. I don't see any performance benefit by 
doing this, except you can eliminate the entire project above Union.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20858
  
@maropu Maybe you can help @mn-mikke  review this PR ?  Will open an 
umbrella JIRA for the built-in functions we plan to do in Apache 2.4. In the 
list, we have multiple for operating nested data.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi

Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20888
  
Just an additional info if I execute the test on my machine alone it never 
pass.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176631936
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression {
  * and Hive function wrappers.
  */
 trait UserDefinedExpression
+
+/**
+ * The trait covers logic for performing null save evaluation and code 
generation.
+ */
+trait NullSafeEvaluation extends Expression
+{
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of NullSafeEvaluation.
+   * If a class utilizing NullSaveEvaluation override [[nullable]], 
probably should also
+   * override this.
+   */
+  override def eval(input: InternalRow): Any =
+  {
+val values = children.map(_.eval(input))
+if (values.contains(null)) null
+else nullSafeEval(values)
+  }
+
+  /**
+   * Called by default [[eval]] implementation. If a class utilizing 
NullSaveEvaluation keep
+   * the default nullability, they can override this method to save 
null-check code.  If we need
+   * full control of evaluation process, we should override [[eval]].
+   */
+  protected def nullSafeEval(inputs: Seq[Any]): Any =
+sys.error(s"The class utilizing NullSaveEvaluation must override 
either eval or nullSafeEval")
+
+  /**
+   * Short hand for generating of null save evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f accepts a sequence of variable names and returns Java code 
to compute the output.
+   */
+  protected def defineCodeGen(
+ctx: CodegenContext,
+ev: ExprCode,
+f: Seq[String] => String): ExprCode = {
+nullSafeCodeGen(ctx, ev, values => {
+  s"${ev.value} = ${f(values)};"
+})
+  }
+
+  /**
+   * Called by expressions to generate null safe evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f a function that accepts a sequence of non-null evaluation 
result names of children
+   *  and returns Java code to compute the output.
+   */
+  protected def nullSafeCodeGen(
--- End diff --

We will combine it with `concat`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176631836
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -408,6 +408,7 @@ object FunctionRegistry {
 expression[MapValues]("map_values"),
 expression[Size]("size"),
 expression[SortArray]("sort_array"),
+expression[ConcatArrays]("concat_arrays"),
--- End diff --

How about move it to collection functions?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi

Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20888
  
Where do you think the reset should happen? There is already one inside 
`withSQLConf` which makes a reset before job submit.

Related the ID I've just taken a look at the original implementation and I 
see to kill the ID 0 
[here](https://github.com/apache/spark/commit/4064574d031215fcfdf899a1ee9f3b6fecb1bfc9).
 What have I missed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-22 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176631255
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression {
  * and Hive function wrappers.
  */
 trait UserDefinedExpression
+
+/**
+ * The trait covers logic for performing null save evaluation and code 
generation.
+ */
+trait NullSafeEvaluation extends Expression
+{
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of NullSafeEvaluation.
+   * If a class utilizing NullSaveEvaluation override [[nullable]], 
probably should also
+   * override this.
+   */
+  override def eval(input: InternalRow): Any =
+  {
+val values = children.map(_.eval(input))
+if (values.contains(null)) null
+else nullSafeEval(values)
+  }
+
+  /**
+   * Called by default [[eval]] implementation. If a class utilizing 
NullSaveEvaluation keep
+   * the default nullability, they can override this method to save 
null-check code.  If we need
+   * full control of evaluation process, we should override [[eval]].
+   */
+  protected def nullSafeEval(inputs: Seq[Any]): Any =
+sys.error(s"The class utilizing NullSaveEvaluation must override 
either eval or nullSafeEval")
+
+  /**
+   * Short hand for generating of null save evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f accepts a sequence of variable names and returns Java code 
to compute the output.
+   */
+  protected def defineCodeGen(
+ctx: CodegenContext,
+ev: ExprCode,
+f: Seq[String] => String): ExprCode = {
+nullSafeCodeGen(ctx, ev, values => {
+  s"${ev.value} = ${f(values)};"
+})
+  }
+
+  /**
+   * Called by expressions to generate null safe evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f a function that accepts a sequence of non-null evaluation 
result names of children
+   *  and returns Java code to compute the output.
+   */
+  protected def nullSafeCodeGen(
--- End diff --

This method looks almost the same with the one in `BinaryExpression`.  Can 
you avoid the code duplication ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20888
  
I think you're right about killing the wrong stage, but I don't think its 
exactly what you've outlined.  The original code doesn't try to kill a stage 
with ID == 0 -- instead its just waiting until that volatile is set to 
something > 0, and then proceeds.  that seems to work fine, we do see that the 
stage does get canceled OK once.

However, I think the problem is because the test [runs twice, with and 
without 
codegen](https://github.com/apache/spark/blob/4d37008c78d7d6b8f8a649b375ecc090700eca4f/sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala#L165).
  The first time, it'll always wait to till the stage Id is set, because of 
that `eventually { ... stageToKill > 0}`.

however, on the second iteration, that `stageToKill` may still be > 0 based 
on the first iteration, not because its been set by the second iteration.  So I 
think you just need to reset the value to -1 between iterations.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...

2018-03-22 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/20541
  
oh ,yeah, there is a little difference, for a + 1 and a + b.
**for a + 1**:
```
`PushProjectionThroughUnion `rule handles:
Union
:- Project [(a#0 + 1) AS aa#10]
:  +- LocalRelation , [a#0, b#1, c#2]
:- Project [(d#3 + 1) AS aa#11]
:  +- LocalRelation , [d#3, e#4, f#5]
+- Project [(g#6 + 1) AS aa#12]
   +- LocalRelation , [g#6, h#7, i#8]

`ColumnPruning `rule handles:
Project [(a#0 + 1) AS aa#9]
Union
:- Project [a#0]
:  +- LocalRelation , [a#0, b#1, c#2]
:- Project [d#3]
:  +- LocalRelation , [d#3, e#4, f#5]
+- Project [g#6]
   +- LocalRelation , [g#6, h#7, i#8]
```
  
**for a + b**:
```
`PushProjectionThroughUnion `rule handles:
Union
:- Project [(a#0 + b#1) AS ab#10]
:  +- LocalRelation , [a#0, b#1, c#2]
:- Project [(d#3 + e#4) AS ab#11]
:  +- LocalRelation , [d#3, e#4, f#5]
+- Project [(g#6 + h#7) AS ab#12]
   +- LocalRelation , [g#6, h#7, i#8]

`ColumnPruning `rule handles:
Project [(a#0 + b#1) AS ab#9]
Union
:- Project [a#0, b#1]
:  +- LocalRelation , [a#0, b#1, c#2]
:- Project [d#3, e#4]
:  +- LocalRelation , [d#3, e#4, f#5]
+- Project [g#6, h#7]
   +- LocalRelation , [g#6, h#7, i#8]
```
  
So I think this may be the reason for the need to add the 
pushprojectionthroughunion rules. and to non-deterministic expression.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20888
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20887
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20887
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88533/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20887
  
**[Test build #88533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88533/testReport)**
 for PR 20887 at commit 
[`78cf26a`](https://github.com/apache/spark/commit/78cf26a6c5ba4a68bb9a09db2f6ebb9edc6c92ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20888
  
**[Test build #88536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88536/testReport)**
 for PR 20888 at commit 
[`42c930d`](https://github.com/apache/spark/commit/42c930d694e0bbc66974516b6719a698d664f681).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wa...

2018-03-22 Thread gaborgsomogyi

GitHub user gaborgsomogyi opened a pull request:

https://github.com/apache/spark/pull/20888

[SPARK-23775][TEST] DataFrameRangeSuite should wait for first stage

## What changes were proposed in this pull request?

DataFrameRangeSuite.test("Cancelling stage in a query with Range.") stays 
sometimes in an infinite loop and times out the build.

I presume the original intention of this test is to start a job with range 
and just cancel it.
The submitted job has 2 stages but I think the author tried to cancel the 
first stage with ID 0 which is not the case here:

```
eventually(timeout(10.seconds), interval(1.millis)) {
  assert(DataFrameRangeSuite.stageToKill > 0)
}
```

All in all if the first stage is slower than 10 seconds it throws 
TestFailedDueToTimeoutException and cancelStage will be never ever called.

This PR changes the test behaviour to wait for the first valid task ID and 
cancel that one.

## How was this patch tested?

Existing unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gaborgsomogyi/spark SPARK-23775

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20888


commit 42c930d694e0bbc66974516b6719a698d664f681
Author: Gabor Somogyi 
Date:   2018-03-23T02:37:27Z

[SPARK-23775][TEST] DataFrameRangeSuite should wait for first stage




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20887
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20887
  
**[Test build #88535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88535/testReport)**
 for PR 20887 at commit 
[`c6d12e8`](https://github.com/apache/spark/commit/c6d12e8fbed4478cf44787aefd753c49940ebfab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20887
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1717/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20884: [SPARK-23773][SQL] JacksonGenerator does not include key...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20884
  
**[Test build #4143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4143/testReport)**
 for PR 20884 at commit 
[`9faf853`](https://github.com/apache/spark/commit/9faf8533d044bd667bac6fb1925f2d38c4d281d4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19876
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19876
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88534/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19876
  
**[Test build #88534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88534/testReport)**
 for PR 19876 at commit 
[`9075626`](https://github.com/apache/spark/commit/9075627f708e271f3bf502a67c28c8810fa7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20883: [SPARK-23759][UI] Unable to bind Spark UI to specific ho...

2018-03-22 Thread felixalbani

Github user felixalbani commented on the issue:

https://github.com/apache/spark/pull/20883
  
@gerashegalov Updated the PR title


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python ...

2018-03-22 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/20842#discussion_r176605304
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -347,6 +347,20 @@ def r2(self):
 """
 return self._call_java("r2")
 
+@property
+@since("2.4.0")
+def r2adj(self):
+"""
+Returns Adjusted R^2^, the adjusted coefficient of determination.
--- End diff --

`R^2^` is a scaladoc format.  How about just `R^2`? and could you fix in 
`def r2` also


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python ...

2018-03-22 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/20842#discussion_r176606317
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -347,6 +347,20 @@ def r2(self):
 """
 return self._call_java("r2")
 
+@property
+@since("2.4.0")
+def r2adj(self):
+"""
+Returns Adjusted R^2^, the adjusted coefficient of determination.
+
+.. seealso:: `Wikipedia coefficient of determination \
+`
--- End diff --

use the same link from the scaladoc 
https://en.wikipedia.org/wiki/Coefficient_of_determination#Adjusted_R2

It also doesn't generate properly, you need a trailing `_` I believe (also 
need to fix `def r2` if you don't mind). 

it should be:
```
.. seealso:: `Wikipedia coefficient of determination \
`_
```

If you are able to, building the docs to check these is a good idea


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19876
  
**[Test build #88534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88534/testReport)**
 for PR 19876 at commit 
[`9075626`](https://github.com/apache/spark/commit/9075627f708e271f3bf502a67c28c8810fa7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19876
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19876
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1716/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88530/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88530/testReport)**
 for PR 20886 at commit 
[`d584c9b`](https://github.com/apache/spark/commit/d584c9b6dd97addf0d993fa4a6dfd85fd2b94a95).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should ...

2018-03-22 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20887#discussion_r176604372
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1152,7 +1152,18 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
* Create a [[Cast]] expression.
*/
   override def visitCast(ctx: CastContext): Expression = withOrigin(ctx) {
-Cast(expression(ctx.expression), visitSparkDataType(ctx.dataType))
+typedVisit[DataType](ctx.dataType) match {
+  case t: CharType =>
+validate(t.length > 0, s"Char length ${t.length} is out of range 
[1, 255]", ctx)
--- End diff --

This one and line 1161 is the same error message with Hive.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-03-22 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19876
  
See some related thoughts in how to support Spark in kubeflow: 
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/119


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20883: [SPARK-23759][UI] Unable to bind Spark2 history server t...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20883
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20883: [SPARK-23759][UI] Unable to bind Spark2 history server t...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20883
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88528/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20883: [SPARK-23759][UI] Unable to bind Spark2 history server t...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20883
  
**[Test build #88528 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88528/testReport)**
 for PR 20883 at commit 
[`4bed6be`](https://github.com/apache/spark/commit/4bed6be070883c8a845a6cc651b0e056fe976538).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20887
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1715/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20887
  
**[Test build #88533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88533/testReport)**
 for PR 20887 at commit 
[`78cf26a`](https://github.com/apache/spark/commit/78cf26a6c5ba4a68bb9a09db2f6ebb9edc6c92ef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20887
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should ...

2018-03-22 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/20887

[SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncate the values

## What changes were proposed in this pull request?

This PR aims to fix the following `CAST` behavior on `CHAR/VARCHAR` types.

**Spark (2.2.1 and 2.3.0)**
```scala
scala> sql("SELECT CAST('123' AS CHAR(1)), CAST('123' AS VARCHAR(1))").show
+---+---+
|CAST(123 AS STRING)|CAST(123 AS STRING)|
+---+---+
|123|123|
+---+---+
```

**Hive (1.2.1 ~ 2.3.2)**
```
hive> SELECT CAST('123' AS CHAR(1)), CAST('123' AS VARCHAR(1));
OK
1   1
```

## How was this patch tested?

Pass the Jenkins with newly added test cases and existing one

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-23774

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20887.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20887


commit 78cf26a6c5ba4a68bb9a09db2f6ebb9edc6c92ef
Author: Dongjoon Hyun 
Date:   2018-03-22T21:37:22Z

[SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncate the values




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20885
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20885
  
**[Test build #88532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88532/testReport)**
 for PR 20885 at commit 
[`77112ef`](https://github.com/apache/spark/commit/77112ef5b12d4738914c78b46c25d058e6201b61).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20885
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88532/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176599395
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

> RM is not actively involved here. 

Right, I meant NM. The rest of the comment for why using Spark's bind 
address is not correct is still valid though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20885
  
**[Test build #88532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88532/testReport)**
 for PR 20885 at commit 
[`77112ef`](https://github.com/apache/spark/commit/77112ef5b12d4738914c78b46c25d058e6201b61).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20883: [SPARK-23759][UI] Unable to bind Spark2 history server t...

2018-03-22 Thread gerashegalov

Github user gerashegalov commented on the issue:

https://github.com/apache/spark/pull/20883
  
This PR's title should reference Spark UI in general as opposed to just SHS


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20885
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88531/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20885
  
**[Test build #88531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88531/testReport)**
 for PR 20885 at commit 
[`6d13d00`](https://github.com/apache/spark/commit/6d13d0022e9062bdd2a5bcf8e7073bdce855b9fc).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-22 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/20877
  
@HyukjinKwon We have a few clients who are interested in processing of JSON 
streaming like data. Here is the PR which combines your changes and mine: 
https://github.com/apache/spark/pull/20885


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20885
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20885
  
**[Test build #88531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88531/testReport)**
 for PR 20885 at commit 
[`6d13d00`](https://github.com/apache/spark/commit/6d13d0022e9062bdd2a5bcf8e7073bdce855b9fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread gerashegalov

Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176598095
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

I will see what to do with this PR since #20883 is going faster


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread gerashegalov

Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176597452
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

> it assumes that both the RM and the Spark app have the same configuration 
w.r.t. which interfaces they're binding to.
RM is not actively involved here. The driver executes on the NM. The launch 
context of the driver prescribes to assign `SPARK_LOCAL_IP` and 
`SPARK_LOCAL_HOSTNAME` on the worker node. Then AM rpcs the tracking URL 
computed on the NM back to RM.  




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20886
  
**[Test build #88530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88530/testReport)**
 for PR 20886 at commit 
[`d584c9b`](https://github.com/apache/spark/commit/d584c9b6dd97addf0d993fa4a6dfd85fd2b94a95).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1714/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20886
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20886: [WIP][SPARK-19724][SQL]create a managed table wit...

2018-03-22 Thread gengliangwang

GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/20886

[WIP][SPARK-19724][SQL]create a managed table with an existed default table 
should throw an exception

## What changes were proposed in this pull request?
This PR is to finish https://github.com/apache/spark/pull/17272

This JIRA is a follow up work after SPARK-19583

As we discussed in that PR

The following DDL for a managed table with an existed default location 
should throw an exception:

CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
CREATE TABLE ... (PARTITIONED BY ...)
Currently there are some situations which are not consist with above logic:

CREATE TABLE ... (PARTITIONED BY ...) succeed with an existed default 
location
situation: for both hive/datasource(with 
HiveExternalCatalog/InMemoryCatalog)

CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
situation: hive table succeed with an existed default location

This PR is going to make above two situations consist with the logic that 
it should throw an exception
with an existed default location.
## How was this patch tested?

unit test added




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark pr-17272

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20886.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20886


commit d584c9b6dd97addf0d993fa4a6dfd85fd2b94a95
Author: windpiger 
Date:   2018-03-22T13:19:21Z

[SPARK-19724][SQL]create a managed table with an existed default table 
should throw an exception




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20885
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20885
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88529/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20885
  
**[Test build #88529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88529/testReport)**
 for PR 20885 at commit 
[`f99c1e1`](https://github.com/apache/spark/commit/f99c1e16f2ad90c2a94e8c4b206b5b740506e136).
 * This patch **fails Python style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20885
  
**[Test build #88529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88529/testReport)**
 for PR 20885 at commit 
[`f99c1e1`](https://github.com/apache/spark/commit/f99c1e16f2ad90c2a94e8c4b206b5b740506e136).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20885: [SPARK-23724][SPARK-23765][SQL] Line separator for the j...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20885
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20885: [SPARK-23724][SPARK-23765][SQL] Line separator fo...

2018-03-22 Thread MaxGekk

GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/20885

[SPARK-23724][SPARK-23765][SQL] Line separator for the json datasource

## What changes were proposed in this pull request?

Currently, 
[TextInputJsonDataSource](https://github.com/databricks/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala#L86)
 uses 
[HadoopFileLinesReader](https://github.com/databricks/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala#L125)
 to split json file to separate lines. The former one 
[splits](https://github.com/databricks/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala#L125)
 json lines by 
[LineRecordReader](https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java#L68)
 without providing recordDelimiter. As a consequence of that, the hadoop 
library [reads lines terminated by one of CR, LF, or 
CRLF](https://github.com/apache/hadoop/blob/trunk/hadoop-common-pro
 
ject/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L185-L254).
 The changes allow to specify the line separator instead of using the auto 
detection method of hadoop library.  If the separator is not specified, the 
line separation method of Hadoop is used by default.

## How was this patch tested?

Added new tests for writing/reading json files with custom line separator

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 json-line-sep

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20885.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20885


commit a794988407b6fd28364f5d993a6a52ac0b85ec5f
Author: Maxim Gekk 
Date:   2018-02-24T20:11:00Z

Adding the delimiter option encoded in base64

commit dccdaa2e97cb4e2f6f8ea7e03320cdb05a43668c
Author: Maxim Gekk 
Date:   2018-02-24T20:59:46Z

Separator encoded as a sequence of bytes in hex

commit d0abab7e4b74dd42e06972f9484bc712b8f11c63
Author: Maxim Gekk 
Date:   2018-02-24T21:06:08Z

Refactoring: removed unused imports and renaming a parameter

commit 674179601b4c82e315eb1156df0f3f5035e91154
Author: Maxim Gekk 
Date:   2018-03-04T17:24:42Z

The sep option is renamed to recordSeparator. The supported format is 
sequence of bytes in hex like x0d 0a

commit e4faae155cb5b0761da9ac72a12f67cdde6b2e6b
Author: Maxim Gekk 
Date:   2018-03-18T12:40:21Z

Renaming recordSeparator to recordDelimiter

commit 01f4ef584a2cc1ce460359f260ebbe22808d034e
Author: Maxim Gekk 
Date:   2018-03-18T13:17:59Z

Comments for the recordDelimiter option

commit 24cedb9d809b026fa36b01fb2b425918b43857df
Author: Maxim Gekk 
Date:   2018-03-18T14:36:31Z

Support other formats of recordDelimiter

commit d40dda22587deaf79cfad3b20ccf6854554fc11d
Author: Maxim Gekk 
Date:   2018-03-18T16:30:26Z

Checking different charsets and record delimiters

commit ad6496c6d9415bcd49630272b5d6c327ffcb1378
Author: Maxim Gekk 
Date:   2018-03-18T16:39:07Z

Renaming test's method to make it more readable

commit 358863d91bf0c0d9761aa13698eb7f8532e5fc90
Author: Maxim Gekk 
Date:   2018-03-18T17:20:38Z

Test of reading json in different charsets and delimiters

commit 7e5be5e2b4cf7f77914a0d91e74ea31ab8c272d0
Author: Maxim Gekk 
Date:   2018-03-18T20:25:47Z

Fix inferring of csv schema for any charsets

commit d138d2d4e7b6e0c3e46d73939ff06a875128d59d
Author: Maxim Gekk 
Date:   2018-03-18T21:02:44Z

Fix errors of scalastyle check

commit c26ef5d3d2a3970c80c973eec696805929bd7725
Author: Maxim Gekk 
Date:   2018-03-22T11:20:34Z

Reserving format for regular expressions and concatenated json

commit 5f0b0694f142bd69127c8991d83a24f528316b2b
Author: Maxim Gekk 
Date:   2018-03-22T20:18:21Z

Fix recordDelimiter tests

commit ef8248f862949becdb3d370ac94a1cfc1f7c3068
Author: Maxim Gekk 
Date:   2018-03-22T20:34:56Z

Additional cases are added to the delimiter test

commit 2efac082ea4e40b89b4d01274851c0dcdd49eb44
Author: Maxim Gekk 
Date:   2018-03-22T21:01:56Z

Renaming recordDelimiter to lineSeparator

commit b2020fa99584d03e1754a4a1b5991dce4875f448
Author: Maxim Gekk 
Date:   2018-03-22T21:38:33Z

Adding HyukjinKwon changes

commit f99c1e16f2ad90c2a94e8c4b206b5b740506e136
Author: Maxim Gekk

[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176585546
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

> the regression of not being able to bind webUI to a specific interface is 
fixed

This can be a unit test for `JettyUtils` if you really care about that.

> they demonstrate how to bind RPC and webUI to different interfaces

This can be added to the docs. Nobody is going to look at unit test code to 
figure out how to do that.

> It's not wrong. it's one of the reasonable choices. 

It actually is, because it assumes that both the RM and the Spark app have 
the same configuration w.r.t. which interfaces they're binding to.

But as you say, it's better to use `$NM_HOST` to find the NM's address 
instead of the current code.

> We need the fix in JettyUtils

There's a separate PR with that fix too (the one I referenced above was 
closed and the correct one was opened as #20883).



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20884: [SPARK-23773][SQL] JacksonGenerator does not include key...

2018-03-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20884
  
**[Test build #4143 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4143/testReport)**
 for PR 20884 at commit 
[`9faf853`](https://github.com/apache/spark/commit/9faf8533d044bd667bac6fb1925f2d38c4d281d4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-22 Thread gerashegalov

Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r176584238
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 checkResult(finalState, result)
   }
 
+  private def testClusterDriverBind(
+  uiEnabled: Boolean,
+  localHost: String,
+  localIp: String,
+  success: Boolean): Unit = {
+val result = File.createTempFile("result", null, tempDir)
+val finalState = runSpark(false, 
mainClassName(YarnClusterDriver.getClass),
+  appArgs = Seq(result.getAbsolutePath()),
+  extraConf = Map(
+"spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
+"spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
+"spark.ui.enabled" -> uiEnabled.toString
+  ))
+if (success) {
+  checkResult(finalState, result, "success")
+} else {
+  finalState should be (SparkAppHandle.State.FAILED)
+}
+  }
+
+  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
--- End diff --

> So I'm not sure the tests are actually that useful. The way they're 
written, as yarn apps, actually makes them very expensive, and this is testing 
basic networking config that we know will not work if the IPs are invalid.

I agree the tests are not cheap. However, they show 
- the regression of not being able to bind webUI to a specific interface is 
fixed 
- they demonstrate how to bind RPC and webUI to different interfaces

> The actual change you're introducing - using the bind address as the 
address of the NM, is actually wrong if you think about it. It just happens 
that the default value of that config is the local host name.

It's not wrong. it's one of the reasonable choices. Moreover it's one that 
is consistent with the setting executors use to bind RPC. Obviously there can 
be others.

> So basically if setting those env variables in the AM fixes the issue I'm 
not sure there's any need to change anything in Spark.

We need the fix in JettyUtils. After thinking more the change 
YarnRMClient.scala should use NM_HOST for registerApplicationMaster because 
it's the right host part of YARN's NodeId.



 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20604: [SPARK-23365][CORE] Do not adjust num executors when kil...

2018-03-22 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/20604
  
@vanzin @sitalkedia @jiangxb1987 I was looking at this code again, and I'd 
appreciate your thoughts on how this relates to 
[SPARK-21834](https://issues.apache.org/jira/browse/SPARK-21834)  
https://github.com/apache/spark/pull/19081

I actually think that SPARK-21834 probably solves the bug I was describing 
initially.  I hit the bug on 2.2.0, and didn't properly understand the change 
of SPARK-21834 when proposing this change.  Nonetheless, I still think this fix 
is a good one -- it improves code clarity in general and fixes a couple other 
minor cases.  I'd also link the issues in jira etc. so the relationship is more 
clear.

I'd go even further and suggest that with this fix in, we can actually 
remove SPARK-21834, as its no longer necessary.  its not harmful, but its just 
confusing.

thoughts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20884: [SPARK-23773][SQL] JacksonGenerator does not include key...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20884
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20884: [SPARK-23773][SQL] JacksonGenerator does not include key...

2018-03-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20884
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20884: [SPARK-23773][SQL] JacksonGenerator does not include key...

2018-03-22 Thread sameeragarwal

Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/20884
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...

2018-03-22 Thread makagonov

GitHub user makagonov opened a pull request:

https://github.com/apache/spark/pull/20884

[SPARK-23773][SQL] JacksonGenerator does not include keys that have null 
value for StructTypes

## What changes were proposed in this pull request?
As stated in Jira, when `toJSON` is called on a dataset, the result JSON 
string will not have keys displayed for `StructType`s that have null value. 
This PR fixes the issue and writes field with "null" value.

## How was this patch tested?
Added a unit test to `JsonSuite.scala`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/makagonov/spark SPARK-23773

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20884.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20884


commit 9faf8533d044bd667bac6fb1925f2d38c4d281d4
Author: Sergey Makagonov 
Date:   2018-03-22T19:38:44Z

[SPARK-23773][SQL] JacksonGenerator does not include keys that have null 
value for StructTypes




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFactory i...

2018-03-22 Thread jose-torres

Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20397
  
In general I have very weak opinions on what classes are named :)

I agree that readers and writers are very different in the DataSourceV2 
API, and they're even more so in streaming-land. So a change emphasizing 
parallelism that isn't really there is more confusing than helpful. The only 
real commonality I see between ReadTask/DataReaderFactory and DataWriterFactory 
is that they are serializable intermediate representations.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFactory i...

2018-03-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20397
  
what do streaming guys think? cc @tdas @jose-torres 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...

2018-03-22 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20861
  
@viirya can you create a backport for 2.3?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20883: [SPARK-23759][UI] Unable to bind Spark2 history server t...

2018-03-22 Thread felixalbani

Github user felixalbani commented on the issue:

https://github.com/apache/spark/pull/20883
  
Removed "Please review ..." remnants from template


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-22 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/20579
  
Thanks a lot @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20883: [SPARK-23759][UI] Unable to bind Spark2 history server t...

2018-03-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20883
  
The change itself LGTM pending tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20883: [SPARK-23759][UI] Unable to bind Spark2 history server t...

2018-03-22 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20883
  
Also, please remove remnants of the template from the PR description 
("Please review ...").


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 336 matches

Mail list logo