[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20847
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88387/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20847
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20847
  
**[Test build #88387 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88387/testReport)**
 for PR 20847 at commit 
[`f130a36`](https://github.com/apache/spark/commit/f130a3603386acee670788d545a9d38969fcf39f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20829
  
**[Test build #88395 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88395/testReport)**
 for PR 20829 at commit 
[`ab91545`](https://github.com/apache/spark/commit/ab91545bebc6e1d0c5c3cb7c15156d546ad48f81).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20807


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88388/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20579
  
**[Test build #88388 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88388/testReport)**
 for PR 20579 at commit 
[`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...

2018-03-19 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20807
  
Merging to master / 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20829
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20853
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20829
  
**[Test build #88392 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88392/testReport)**
 for PR 20829 at commit 
[`9624061`](https://github.com/apache/spark/commit/9624061920bfe0a2cf06af054d9988c95cd0b322).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-19 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175629947
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

There's a big different between the RPC endpoint and the web UI.

For the RPC endpoint, your change is fixing a potential issue. For the web 
UI, this change is potentially introducing a new issue in non-YARN mode. So 
please, revert this one change and all the rest is good to go.

Otherwise I can't merge this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20829
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88392/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20853
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88385/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20853
  
**[Test build #88385 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88385/testReport)**
 for PR 20853 at commit 
[`9f391de`](https://github.com/apache/spark/commit/9f391de914e3be5ed61e00d1c651349db3d52cb6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-03-19 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20657
  
I'm really sorry about the delay @vanzin @squito . I will take another 
review today and back to you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20796: [SPARK-23649][SQL] Skipping chars disallowed in U...

2018-03-19 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20796#discussion_r175626064
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -187,8 +218,9 @@ public void writeTo(OutputStream out) throws 
IOException {
* @param b The first byte of a code point
*/
   private static int numBytesForFirstByte(final byte b) {
-final int offset = (b & 0xFF) - 192;
-return (offset >= 0) ? bytesOfCodePointInUTF8[offset] : 1;
+final int offset = b & 0xFF;
+byte numBytes = bytesOfCodePointInUTF8[offset];
+return (numBytes == 0) ? 1: numBytes; // Skip the first byte 
disallowed in UTF-8
--- End diff --

Is the comment valid? Do we skip it? Don't we still count the disallowed 
byte as one code point in `numChars`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...

2018-03-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r175625855
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -616,18 +616,22 @@ booleanValue
 ;
 
 interval
-: INTERVAL intervalField*
+: INTERVAL? intervalField+
 ;
 
 intervalField
-: value=intervalValue unit=identifier (TO to=identifier)?
+: value=intervalValue unit=intervalUnit (TO to=identifier)?
--- End diff --

right, I checked the hive parser and it seems to handle `TO` as units. But, 
this is an refactoring issue, so is it okay to include this fix in this pr? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20829
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88390/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20829
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20829
  
**[Test build #88390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88390/testReport)**
 for PR 20829 at commit 
[`2b1fd4e`](https://github.com/apache/spark/commit/2b1fd4e36fffefb1e1679221353426cb05320104).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20796: [SPARK-23649][SQL] Skipping chars disallowed in U...

2018-03-19 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20796#discussion_r175624764
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -57,12 +57,39 @@
   public Object getBaseObject() { return base; }
   public long getBaseOffset() { return offset; }
 
-  private static int[] bytesOfCodePointInUTF8 = {2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2,
-2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
-3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
-4, 4, 4, 4, 4, 4, 4, 4,
-5, 5, 5, 5,
-6, 6};
+  /**
+   * A char in UTF-8 encoding can take 1-4 bytes depending on the first 
byte which
+   * indicates the size of the char. See Unicode standard in page 126:
+   * http://www.unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf
+   *
+   * BinaryHex  Comments
+   * 0xxx  0x00..0x7F   Only byte of a 1-byte character encoding
+   * 10xx  0x80..0xBF   Continuation bytes (1-3 continuation bytes)
+   * 110x  0xC0..0xDF   First byte of a 2-byte character encoding
--- End diff --

Yes, `0xC2..0xDF` should be the first byte for 2-byte character encoding. 
But here you said `0xC0..0xDF`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20796
  
**[Test build #88394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88394/testReport)**
 for PR 20796 at commit 
[`5557a80`](https://github.com/apache/spark/commit/5557a80d4674e929332d9441342e5b90e314eb45).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...

2018-03-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r175624411
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -1479,15 +1479,13 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   e.message.contains("Cannot save interval data type into external 
storage")
 })
 
-val e1 = intercept[AnalysisException] {
-  sql("select interval")
--- End diff --

yea, I'll revert this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8

2018-03-19 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20796
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20796
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88384/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20796
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...

2018-03-19 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/20856
  
BTW, I've just realized that even without the issue it's clear that 
creating a new `FileSourceScanExec` will end up with a NPE from the 
`supportsBatch` field.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20796
  
**[Test build #88384 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88384/testReport)**
 for PR 20796 at commit 
[`5557a80`](https://github.com/apache/spark/commit/5557a80d4674e929332d9441342e5b90e314eb45).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-19 Thread gerashegalov
Github user gerashegalov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20327#discussion_r175623955
  
--- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
@@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
   def bind(): Unit = {
 assert(serverInfo.isEmpty, s"Attempted to bind $className more than 
once!")
 try {
-  val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
+  val host = if (Utils.isClusterMode(conf)) {
--- End diff --

I formulated the problem more broadly in the title of the PR: "NM host for 
driver end points". It's not a intuitive default behavior to bind to `0.0.0.0` 
if the backend (YARN) is explicitly configured not to, and we need a mechanism 
to allow Spark to inherit the YARN-determined bind address on NM.
You convinced me that the client mode is less critical, and it's easy to 
override via environment of spark submit (after the bug fix). Although I'd 
prefer using bindAddress everywhere for consistency and as it's documented.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...

2018-03-19 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/20856
  
I spent over 2 days applying different modifications to the query hoping I 
could cut the number of `CASE WHEN`s and other projections, but noticed no 
correlation between the number or their "types". I'll see if renaming the 
columns leads to the issue and submit a test case. Thanks for your support!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...

2018-03-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r175622969
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/interval.sql ---
@@ -0,0 +1,184 @@
+-- Basic tests for intervals
+
+select
+  '1' second,
+  2  seconds,
--- End diff --

yea. Since `days`, `seconds`, ... seems to be non-sql standard and this is 
out-of-scope in this pr, I'll drop these in this pr.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20827
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1620/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20827
  
**[Test build #88393 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88393/testReport)**
 for PR 20827 at commit 
[`bee3711`](https://github.com/apache/spark/commit/bee3711074a7d34cf19e8794f837b70eddaffbe0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20827
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20827: [SPARK-23666][SQL] Do not display exprIds of Alia...

2018-03-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20827#discussion_r175621271
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 ---
@@ -324,31 +324,28 @@ case class AttributeReference(
  * A place holder used when printing expressions without debugging 
information such as the
  * expression id or the unresolved indicator.
  */
-case class PrettyAttribute(
+case class PrettyNamedExpression(
 name: String,
 dataType: DataType = NullType)
-  extends Attribute with Unevaluable {
--- End diff --

You mean the latest commit? I just renamed this because `Alias` is not 
`Attribute`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20829
  
**[Test build #88392 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88392/testReport)**
 for PR 20829 at commit 
[`9624061`](https://github.com/apache/spark/commit/9624061920bfe0a2cf06af054d9988c95cd0b322).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20687
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1619/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20687
  
**[Test build #88391 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88391/testReport)**
 for PR 20687 at commit 
[`5926301`](https://github.com/apache/spark/commit/592630148af19adbb72703dd1ff49f82c33087d2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20687
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-19 Thread henryr
Github user henryr commented on the issue:

https://github.com/apache/spark/pull/20687
  
@gatorsmile ok, I think the coverage right now is a reasonable start - the 
other test cases I can think of would act more like they're exercising the 
expression-walking code, not the actual simplification. Look forward to 
collaborating on the follow-up PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20795
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20829
  
**[Test build #88390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88390/testReport)**
 for PR 20829 at commit 
[`2b1fd4e`](https://github.com/apache/spark/commit/2b1fd4e36fffefb1e1679221353426cb05320104).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20795
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88386/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20853
  
**[Test build #88389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88389/testReport)**
 for PR 20853 at commit 
[`8a12452`](https://github.com/apache/spark/commit/8a124522519ed4f8fb750555f1a596c9f97b6947).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20795
  
**[Test build #88386 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88386/testReport)**
 for PR 20795 at commit 
[`93b115e`](https://github.com/apache/spark/commit/93b115eb398ce69afd63c3f11915d1bbbef15ce1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-19 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20858
  
Thinks for this work! One question; why do you think we need to support 
this api in Spark native? Other libraries support this as first-class?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-03-19 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20850
  
@hvanhovell btw, (this is not related to this pr thought...) the most part 
of code in `UTF8StringBuffer` and `BufferHolder` are overlapped. So, we could 
clean up there, too? 
https://github.com/apache/spark/compare/master...maropu:CleanupBufferImpl 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20803: [SPARK-23653][SQL] Show sql statement in spark SQ...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20803#discussion_r175609377
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -166,20 +168,28 @@ private[sql] object Dataset {
 class Dataset[T] private[sql](
 @transient val sparkSession: SparkSession,
 @DeveloperApi @InterfaceStability.Unstable @transient val 
queryExecution: QueryExecution,
-encoder: Encoder[T])
+encoder: Encoder[T],
+val sqlText: String = "")
--- End diff --

And how does the SQL shell execute commands? like `SELECT * FROM ...`, does 
it display all the rows or add a LIMIT before displaying? Generally we should 
not propagate sql text, as a new DataFrame usually means the plan is changed, 
the SQL text is not accurate anymore.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r175609234
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -40,29 +37,45 @@
  */
 public final class UnsafeRowWriter extends UnsafeWriter {
 
-  private final BufferHolder holder;
-  // The offset of the global buffer where we start to write this row.
-  private int startingOffset;
+  private final UnsafeRow row;
+
   private final int nullBitsSize;
   private final int fixedSize;
 
-  public UnsafeRowWriter(BufferHolder holder, int numFields) {
-this.holder = holder;
+  public UnsafeRowWriter(UnsafeRow row, int initialBufferSize) {
+this(row, new BufferHolder(row, initialBufferSize), row.numFields());
+  }
+
+  public UnsafeRowWriter(UnsafeRow row) {
+this(row, new BufferHolder(row), row.numFields());
+  }
+
+  public UnsafeRowWriter(UnsafeWriter writer, int numFields) {
+this(null, writer.getBufferHolder(), numFields);
+  }
+
+  private UnsafeRowWriter(UnsafeRow row, BufferHolder holder, int 
numFields) {
+super(holder);
+this.row = row;
 this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
 this.fixedSize = nullBitsSize + 8 * numFields;
-this.startingOffset = holder.cursor;
+this.startingOffset = cursor();
+  }
+
+  public void setTotalSize() {
--- End diff --

If we'd like to make generated code blocks easy-to-read, we should depend 
on generated comments instead of api names, I think. Anyway, this decision 
depends on other dev's thoughts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20803: [SPARK-23653][SQL] Show sql statement in spark SQ...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20803#discussion_r175608866
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -166,20 +168,28 @@ private[sql] object Dataset {
 class Dataset[T] private[sql](
 @transient val sparkSession: SparkSession,
 @DeveloperApi @InterfaceStability.Unstable @transient val 
queryExecution: QueryExecution,
-encoder: Encoder[T])
+encoder: Encoder[T],
+val sqlText: String = "")
--- End diff --

what's the exact rule you defined to decide whether or not we should 
propagate the sql text?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-19 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r175608496
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -86,11 +88,39 @@ public void grow(int neededSize) {
 }
   }
 
-  public void reset() {
+  byte[] buffer() {
+return buffer;
+  }
+
+  int getCursor() {
+return cursor;
+  }
+
+  void incrementCursor(int val) {
+cursor += val;
+  }
+
+  int pushCursor() {
--- End diff --

Can we make this a little bit less complex? I think just storing the cursor 
in the UnsafeWriter is enough.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20806
  
great! maybe we can hold this PR for a real SQL tree aggregate in the 
future, with some proper design and discussion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r175608099
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
 ---
@@ -17,17 +17,71 @@
 package org.apache.spark.sql.catalyst.expressions.codegen;
 
 import org.apache.spark.sql.types.Decimal;
+import org.apache.spark.unsafe.Platform;
+import org.apache.spark.unsafe.array.ByteArrayMethods;
 import org.apache.spark.unsafe.types.CalendarInterval;
 import org.apache.spark.unsafe.types.UTF8String;
 
 /**
  * Base class for writing Unsafe* structures.
  */
 public abstract class UnsafeWriter {
+  // Keep internal buffer holder
+  protected final BufferHolder holder;
+
+  // The offset of the global buffer where we start to write this 
structure.
+  protected int startingOffset;
+
+  protected UnsafeWriter(BufferHolder holder) {
+this.holder = holder;
+  }
+
+  /**
+   * Accessor methods are delegated from BufferHolder class
+   */
+  public final BufferHolder getBufferHolder() {
+return holder;
+  }
+
+  public final byte[] buffer() { return holder.buffer(); }
--- End diff --

No performance impact?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20579
  
**[Test build #88388 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88388/testReport)**
 for PR 20579 at commit 
[`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88381/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1618/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20701
  
**[Test build #88381 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88381/testReport)**
 for PR 20701 at commit 
[`f6ee4a2`](https://github.com/apache/spark/commit/f6ee4a2b4bb2444d65ab0e26a141304b327bd998).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class KMeansModel (@Since(\"1.0.0\") val clusterCenters: 
Array[Vector],`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19222
  
Looks pretty good! My major concern is, the semantic of offset in 
`MemoryBlock.getXXX` should be implementation independent. Users should always 
assume the offset starts with 0. Since it's about API/semantic, it's better to 
do it right at the first place. Thanks for working on it!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r175605407
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -119,15 +115,24 @@ public static UTF8String blankString(int length) {
 return fromBytes(spaces);
   }
 
-  protected UTF8String(Object base, long offset, int numBytes) {
+  protected UTF8String(byte[] bytes, long offset, int numBytes) {
+this(new ByteArrayMemoryBlock(bytes, offset, numBytes));
+  }
+
+  public UTF8String(MemoryBlock base) {
 this.base = base;
-this.offset = offset;
-this.numBytes = numBytes;
+if (base != null) {
+  if ((long) Integer.MAX_VALUE < base.size()) {
+throw new ArrayIndexOutOfBoundsException(
+  "MemoryBlock.size " + base.size() + " should be less than " + 
Integer.MAX_VALUE);
+  }
+  this.numBytes = (int) base.size();
--- End diff --

why not use `Ints.checkedCast`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r175605220
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -119,15 +115,24 @@ public static UTF8String blankString(int length) {
 return fromBytes(spaces);
   }
 
-  protected UTF8String(Object base, long offset, int numBytes) {
+  protected UTF8String(byte[] bytes, long offset, int numBytes) {
+this(new ByteArrayMemoryBlock(bytes, offset, numBytes));
+  }
+
+  public UTF8String(MemoryBlock base) {
 this.base = base;
-this.offset = offset;
-this.numBytes = numBytes;
+if (base != null) {
--- End diff --

I think we can actually add ab assert to make sure `base` is not null.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r175605033
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -119,15 +115,24 @@ public static UTF8String blankString(int length) {
 return fromBytes(spaces);
   }
 
-  protected UTF8String(Object base, long offset, int numBytes) {
+  protected UTF8String(byte[] bytes, long offset, int numBytes) {
+this(new ByteArrayMemoryBlock(bytes, offset, numBytes));
+  }
+
+  public UTF8String(MemoryBlock base) {
 this.base = base;
-this.offset = offset;
-this.numBytes = numBytes;
+if (base != null) {
+  if ((long) Integer.MAX_VALUE < base.size()) {
--- End diff --

do we need to do the cast? I think comparing int and long, java would cast 
the int to long automatically.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r175604820
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -36,22 +42,34 @@ public MemoryBlock allocate(long size) throws 
OutOfMemoryError {
 
   @Override
   public void free(MemoryBlock memory) {
-assert (memory.obj == null) :
-  "baseObject not null; are you trying to use the off-heap allocator 
to free on-heap memory?";
-assert (memory.pageNumber != 
MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER) :
+assert(memory instanceof OffHeapMemoryBlock) :
+  "UnsafeMemoryAllocator can only free OffHeapMemoryBlock.";
+if (memory == OffHeapMemoryBlock.NULL) return;
+assert (memory.getPageNumber() != 
MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER) :
   "page has already been freed";
-assert ((memory.pageNumber == MemoryBlock.NO_PAGE_NUMBER)
-|| (memory.pageNumber == 
MemoryBlock.FREED_IN_TMM_PAGE_NUMBER)) :
+assert ((memory.getPageNumber() == MemoryBlock.NO_PAGE_NUMBER)
+|| (memory.getPageNumber() == 
MemoryBlock.FREED_IN_TMM_PAGE_NUMBER)) :
   "TMM-allocated pages must be freed via TMM.freePage(), not directly 
in allocator free()";
 
 if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
   memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_FREED_VALUE);
 }
+
 Platform.freeMemory(memory.offset);
+
 // As an additional layer of defense against use-after-free bugs, we 
mutate the
 // MemoryBlock to reset its pointer.
-memory.offset = 0;
+memory.resetObjAndOffset();
 // Mark the page as freed (so we can detect double-frees).
-memory.pageNumber = MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER;
+memory.setPageNumber(MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER);
+  }
+
+  public OffHeapMemoryBlock reallocate(OffHeapMemoryBlock block, long 
oldSize, long newSize) {
--- End diff --

no one is calling it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r175604535
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +45,159 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
-  public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+  protected MemoryBlock(@Nullable Object obj, long offset, long length) {
+if (offset < 0 || length < 0) {
+  throw new ArrayIndexOutOfBoundsException(
+"Length " + length + " and offset " + offset + "must be 
non-negative");
+}
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  protected MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(
+"Instantiate MemoryBlock for type " + obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Just instantiate the same type of MemoryBlock with new offset and 
size. The data is not
+   * copied. If parameters are invalid, an exception is thrown
+   */
+  public abstract MemoryBlock subBlock(long offset, long size);
+
+  protected void checkSubBlockRange(long offset, long size) {
+if (this.offset + offset < 0 || size < 0) {
+  throw new ArrayIndexOutOfBoundsException(
+"Size " + size + " and offset " + (this.offset + offset) + " must 
be non-negative");
+}
+if (offset + size > length) {
+  throw new ArrayIndexOutOfBoundsException("The sum of size " + size + 
" and offset " +
+offset + " should not be larger than the length " + length + " in 
the MemoryBlock");
+}
+  }
+
+  /**
+   * getXXX/putXXX does not ensure guarantee behavior if the offset is 
invalid. e.g  cause illegal
+   * memory access, throw an exception, or etc.
+   */
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public 

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r175604062
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java
 ---
@@ -94,12 +95,12 @@ public void free(MemoryBlock memory) {
 }
 
 // Mark the page as freed (so we can detect double-frees).
-memory.pageNumber = MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER;
+memory.setPageNumber(MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER);
 
 // As an additional layer of defense against use-after-free bugs, we 
mutate the
 // MemoryBlock to null out its reference to the long[] array.
-long[] array = (long[]) memory.obj;
-memory.setObjAndOffset(null, 0);
+long[] array = ((OnHeapMemoryBlock)memory).getLongArray();
+memory.resetObjAndOffset();
 
 long alignedSize = ((size + 7) / 8) * 8;
 if (shouldPool(alignedSize)) {
--- End diff --

I think in the future we should cache `MemoryBlock` directly, so that we 
can have a unified pool for both on-heap and off-heap memory manager.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20796
  
LGTM, pending jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20847
  
**[Test build #88387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88387/testReport)**
 for PR 20847 at commit 
[`f130a36`](https://github.com/apache/spark/commit/f130a3603386acee670788d545a9d38969fcf39f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20847
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1617/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20847
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20847
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20847
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88382/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20847
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20847
  
**[Test build #88382 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88382/testReport)**
 for PR 20847 at commit 
[`f130a36`](https://github.com/apache/spark/commit/f130a3603386acee670788d545a9d38969fcf39f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20795
  
**[Test build #88386 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88386/testReport)**
 for PR 20795 at commit 
[`93b115e`](https://github.com/apache/spark/commit/93b115eb398ce69afd63c3f11915d1bbbef15ce1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r175592083
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -49,55 +51,81 @@ public static int hashInt(int input, int seed) {
   }
 
   public int hashUnsafeWords(Object base, long offset, int lengthInBytes) {
-return hashUnsafeWords(base, offset, lengthInBytes, seed);
+return hashUnsafeWordsBlock(MemoryBlock.allocateFromObject(base, 
offset, lengthInBytes), seed);
+  }
+
+  public static int hashUnsafeWordsBlock(MemoryBlock base, int seed) {
+// This is based on Guava's 
`Murmur32_Hasher.processRemaining(ByteBuffer)` method.
+int lengthInBytes = Ints.checkedCast(base.size());
+assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 
8 (word-aligned)";
+int h1 = hashBytesByIntBlock(base, seed);
+return fmix(h1, lengthInBytes);
   }
 
   public static int hashUnsafeWords(Object base, long offset, int 
lengthInBytes, int seed) {
 // This is based on Guava's 
`Murmur32_Hasher.processRemaining(ByteBuffer)` method.
 assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 
8 (word-aligned)";
-int h1 = hashBytesByInt(base, offset, lengthInBytes, seed);
+int h1 = hashBytesByIntBlock(MemoryBlock.allocateFromObject(base, 
offset, lengthInBytes), seed);
--- End diff --

It's more consistent to call `hashUnsafeWordsBlock` here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Which document do you have in mind? A section in 
`docs/sql-programming-guide.md`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20796: [SPARK-23649][SQL] Skipping chars disallowed in U...

2018-03-19 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20796#discussion_r175590297
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -57,12 +57,39 @@
   public Object getBaseObject() { return base; }
   public long getBaseOffset() { return offset; }
 
-  private static int[] bytesOfCodePointInUTF8 = {2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2,
-2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
-3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
-4, 4, 4, 4, 4, 4, 4, 4,
-5, 5, 5, 5,
-6, 6};
+  /**
+   * A char in UTF-8 encoding can take 1-4 bytes depending on the first 
byte which
+   * indicates the size of the char. See Unicode standard in page 126:
+   * http://www.unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf
+   *
+   * BinaryHex  Comments
+   * 0xxx  0x00..0x7F   Only byte of a 1-byte character encoding
+   * 10xx  0x80..0xBF   Continuation bytes (1-3 continuation bytes)
+   * 110x  0xC0..0xDF   First byte of a 2-byte character encoding
--- End diff --

Here is the table of allowed first bytes:
https://user-images.githubusercontent.com/1580697/37623134-eca5756a-2bc3-11e8-99a0-3b538697d15c.png;>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20796: [SPARK-23649][SQL] Skipping chars disallowed in U...

2018-03-19 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20796#discussion_r175589638
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java 
---
@@ -791,4 +795,21 @@ public void trimRightWithTrimString() {
 assertEquals(fromString("头"), 
fromString("头a???/").trimRight(fromString("数?/*&^%a")));
 assertEquals(fromString("头"), fromString("头数b数数 
[").trimRight(fromString(" []数b")));
   }
+
+  @Test
+  public void skipWrongFirstByte() {
+int[] wrongFirstBytes = {
--- End diff --

The bytes are not filtered by UTF8String methods. For instance, in the case 
of csv datasource  the invalid bytes are just passed to the final result. See 
https://issues.apache.org/jira/browse/SPARK-23649

I have created a separate ticket to fix the issue: 
https://issues.apache.org/jira/browse/SPARK-23741 . 

I am not sure that the issue of output of wrong UTF-8 chars should be 
addressed by this PR (this pr just fixes crashes on wrong input) because it 
could impact on users and other Spark components. Need to discuss it and test 
it carefully.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20859: [SPARK-23702][SS] Forbid watermarks on both sides of sta...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20859
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88383/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20859: [SPARK-23702][SS] Forbid watermarks on both sides of sta...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20859
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20859: [SPARK-23702][SS] Forbid watermarks on both sides of sta...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20859
  
**[Test build #88383 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88383/testReport)**
 for PR 20859 at commit 
[`051c85a`](https://github.com/apache/spark/commit/051c85afe727f39ba9d505e00e162620f69c808f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20818: [SPARK-23675][WEB-UI]Title add spark logo, use spark log...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20818
  
**[Test build #4141 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4141/testReport)**
 for PR 20818 at commit 
[`964439b`](https://github.com/apache/spark/commit/964439be7a592b2a94f93008dabc45a18f97c3c6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20859: [SPARK-23702][SS] Forbid watermarks on both sides...

2018-03-19 Thread jose-torres
Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/20859#discussion_r175585873
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -160,6 +160,19 @@ object UnsupportedOperationChecker {
 case _: InsertIntoDir =>
   throwError("InsertIntoDir is not supported with streaming 
DataFrames/Datasets")
 
+case e: EventTimeWatermark =>
+  val statefulChildren = e.collect {
+case a: Aggregate if a.isStreaming => a
+case d: Deduplicate if d.isStreaming => d
+case f: FlatMapGroupsWithState if f.isStreaming => f
+  }
+  statefulChildren.foreach { statefulNode =>
+if (statefulNode.collectFirst{ case e: EventTimeWatermark => e 
}.isDefined) {
+  throwError("Watermarks both before and after a stateful 
operator in a streaming " +
--- End diff --

WDYT of something like "watermarks may not be present...". Talking about 
"well-defined" seems a bit confusing to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20853#discussion_r175585634
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala 
---
@@ -137,16 +138,36 @@ private[deploy] object DependencyUtils {
   def resolveGlobPaths(paths: String, hadoopConf: Configuration): String = 
{
 require(paths != null, "paths cannot be null.")
 Utils.stringToSeq(paths).flatMap { path =>
-  val uri = Utils.resolveURI(path)
-  uri.getScheme match {
-case "local" | "http" | "https" | "ftp" => Array(path)
-case _ =>
-  val fs = FileSystem.get(uri, hadoopConf)
-  Option(fs.globStatus(new Path(uri))).map { status =>
-status.filter(_.isFile).map(_.getPath.toUri.toString)
-  }.getOrElse(Array(path))
+  val (base, fragment) = splitOnFragment(Utils.resolveURI(path))
+  (resolveGlobPath(base, hadoopConf), fragment) match {
+case (resolved: Array[String], Some(_)) if resolved.length > 1 => 
throw new SparkException(
+s"${base.toString} resolves ambiguously to multiple files: 
${resolved.mkString(",")}")
+case (resolved: Array[String], Some(namedAs)) => resolved.map( _ + 
"#" + namedAs)
--- End diff --

Same here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20853#discussion_r175585581
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala 
---
@@ -137,16 +138,36 @@ private[deploy] object DependencyUtils {
   def resolveGlobPaths(paths: String, hadoopConf: Configuration): String = 
{
 require(paths != null, "paths cannot be null.")
 Utils.stringToSeq(paths).flatMap { path =>
-  val uri = Utils.resolveURI(path)
-  uri.getScheme match {
-case "local" | "http" | "https" | "ftp" => Array(path)
-case _ =>
-  val fs = FileSystem.get(uri, hadoopConf)
-  Option(fs.globStatus(new Path(uri))).map { status =>
-status.filter(_.isFile).map(_.getPath.toUri.toString)
-  }.getOrElse(Array(path))
+  val (base, fragment) = splitOnFragment(Utils.resolveURI(path))
+  (resolveGlobPath(base, hadoopConf), fragment) match {
+case (resolved: Array[String], Some(_)) if resolved.length > 1 => 
throw new SparkException(
--- End diff --

Type inference is not working here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20809: [SPARK-23667][CORE] Better scala version check

2018-03-19 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20809
  
> This is on travis and no SPARK_HOME is set.

That sounds a little odd. If that is true, then your proposed code wouldn't 
work either, since it requires SPARK_HOME to be known.

In any case, there are two calls to `getScalaVersion()`.

First is:
```
boolean prependClasses = !isEmpty(getenv("SPARK_PREPEND_CLASSES"));
boolean isTesting = "1".equals(getenv("SPARK_TESTING"));
if (prependClasses || isTesting) {
  String scala = getScalaVersion();
```

And your code shouldn't be triggering that, since both env variables are 
for Spark development and other applications shouldn't be using them.

Second call is a little later:
```
String jarsDir = findJarsDir(getSparkHome(), getScalaVersion(), 
!isTesting && !isTestingSql);
```

Here `getScalaVersion()` is only needed when running Spark from a git 
clone, not from the distribution package. So the right thing would be to move 
`getScalaVersion()` to `CommandBuilderUtils`, and call it from `findJarsDir` 
only if needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20850
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20850
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88380/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20853
  
**[Test build #88385 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88385/testReport)**
 for PR 20853 at commit 
[`9f391de`](https://github.com/apache/spark/commit/9f391de914e3be5ed61e00d1c651349db3d52cb6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20850
  
**[Test build #88380 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88380/testReport)**
 for PR 20850 at commit 
[`c342f0d`](https://github.com/apache/spark/commit/c342f0da350f53ad0683bda115174d4debbcf1e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20208
  
Let us write the document in this PR? Document our current behaviors with 
the related test cases. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20853
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20853
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88379/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20853
  
**[Test build #88379 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88379/testReport)**
 for PR 20853 at commit 
[`50b5ad1`](https://github.com/apache/spark/commit/50b5ad1289fabffe4f661d0b44fcc9ef3ecd2367).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r175581710
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java 
---
@@ -48,6 +49,16 @@ public static int roundNumberOfBytesToNearestWord(int 
numBytes) {
   public static int MAX_ROUNDED_ARRAY_LENGTH = Integer.MAX_VALUE - 15;
 
   private static final boolean unaligned = Platform.unaligned();
+  /**
+   * MemoryBlock equality check for MemoryBlocks.
+   * @return true if the arrays are equal, false otherwise
+   */
+  public static boolean arrayEqualsBlock(
--- End diff --

no one is calling it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20796
  
**[Test build #88384 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88384/testReport)**
 for PR 20796 at commit 
[`5557a80`](https://github.com/apache/spark/commit/5557a80d4674e929332d9441342e5b90e314eb45).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20816: [SPARK-21479][SQL] Outer join filter pushdown in ...

2018-03-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20816#discussion_r175580746
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -669,11 +672,42 @@ object InferFiltersFromConstraints extends 
Rule[LogicalPlan] with PredicateHelpe
   val newConditionOpt = conditionOpt match {
 case Some(condition) =>
   val newFilters = additionalConstraints -- 
splitConjunctivePredicates(condition)
-  if (newFilters.nonEmpty) Option(And(newFilters.reduce(And), 
condition)) else None
+  if (newFilters.nonEmpty) Option(And(newFilters.reduce(And), 
condition)) else conditionOpt
 case None =>
   additionalConstraints.reduceOption(And)
   }
-  if (newConditionOpt.isDefined) Join(left, right, joinType, 
newConditionOpt) else join
+  // Infer filter for left/right outer joins
+  val newLeftOpt = joinType match {
+case RightOuter if newConditionOpt.isDefined =>
+  val rightConstraints = right.constraints.union(
+splitConjunctivePredicates(newConditionOpt.get).toSet)
+  val inferredConstraints = ExpressionSet(
+
QueryPlanConstraints.inferAdditionalConstraints(rightConstraints))
+  val leftConditions = inferredConstraints
--- End diff --

Let us leave `allConstraints ` untouched. We should avoid the extra code 
changes in this PR. We need to use `conf.constraintPropagationEnabled` for the 
extra constraints introduced  by this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >