[GitHub] spark issue #20653: [SPARK-23459][SQL] Improve the error message when unknow...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20653
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19788
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20653: [SPARK-23459][SQL] Improve the error message when unknow...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20653
  
**[Test build #87612 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87612/testReport)**
 for PR 20653 at commit 
[`b5c8d56`](https://github.com/apache/spark/commit/b5c8d562434354ed6e68f41b32c79eda60ae6d07).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19788
  
**[Test build #87610 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87610/testReport)**
 for PR 19788 at commit 
[`583666b`](https://github.com/apache/spark/commit/583666bf5ae6102180851fa1cc3129cc344e919c).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20653: [SPARK-23459][SQL] Improve the error message when unknow...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20653
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87612/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19788
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87610/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20649: [SPARK-23462][SQL] improve missing field error me...

2018-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20649#discussion_r169882987
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StructTypeSuite.scala ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.types.StructType
+
+class StructTypeSuite extends SparkFunSuite {
--- End diff --

Could we put this file under 
`sql/catalyst/src/test/scala/org/apache/spark/sql/types` or move the test cases 
into `DataTypeSuite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-22 Thread yucai
Github user yucai commented on the issue:

https://github.com/apache/spark/pull/19788
  
@cloud-fan if encryption is enabled 
`blockManager.serializerManager().encryptionEnabled() == true`, shall we 
disable this feature also?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19788: [SPARK-9853][Core] Optimize shuffle fetch of cont...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19788#discussion_r169898393
  
--- Diff: project/MimaExcludes.scala ---
@@ -1129,6 +1129,12 @@ object MimaExcludes {
   
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasLoss.org$apache$spark$ml$param$shared$HasLoss$_setter_$loss_="),
   
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasLoss.getLoss"),
   
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasLoss.loss")
+) ++ Seq(
+  // [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition 
IDs.
+  
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.FetchFailed$"),
--- End diff --

We can get rid of this if we don't touch `FetchFailed`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19788: [SPARK-9853][Core] Optimize shuffle fetch of cont...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19788#discussion_r169899114
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -295,8 +307,8 @@ private[spark] abstract class MapOutputTracker(conf: 
SparkConf) extends Logging
* and the second item is a sequence of (shuffle block id, 
shuffle block size) tuples
* describing the shuffle blocks that are stored at that block 
manager.
*/
-  def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, 
endPartition: Int)
-  : Seq[(BlockManagerId, Seq[(BlockId, Long)])]
+  def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, 
endPartition: Int,
+  serializer: Serializer = null): Seq[(BlockManagerId, Seq[(BlockId, 
Long)])]
--- End diff --

if possible let's not introduce a default value here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20651
  
@zsxwing SKIPPED stages were previously shown as PENDING. Despite I agree 
that your fix is needed, I think this is needed too. Do you agree @vanzin?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19788: [SPARK-9853][Core] Optimize shuffle fetch of cont...

2018-02-22 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19788#discussion_r169901683
  
--- Diff: project/MimaExcludes.scala ---
@@ -1129,6 +1129,12 @@ object MimaExcludes {
   
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasLoss.org$apache$spark$ml$param$shared$HasLoss$_setter_$loss_="),
   
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasLoss.getLoss"),
   
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasLoss.loss")
+) ++ Seq(
+  // [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition 
IDs.
+  
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.FetchFailed$"),
--- End diff --

I see, so I will keep FetchFailed as it was (no numBlocks is introduced), 
is it OK?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19788: [SPARK-9853][Core] Optimize shuffle fetch of cont...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19788#discussion_r169902931
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -295,8 +307,8 @@ private[spark] abstract class MapOutputTracker(conf: 
SparkConf) extends Logging
* and the second item is a sequence of (shuffle block id, 
shuffle block size) tuples
* describing the shuffle blocks that are stored at that block 
manager.
*/
-  def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, 
endPartition: Int)
-  : Seq[(BlockManagerId, Seq[(BlockId, Long)])]
+  def getMapSizesByExecutorId(shuffleId: Int, startPartition: Int, 
endPartition: Int,
+  serializer: Serializer = null): Seq[(BlockManagerId, Seq[(BlockId, 
Long)])]
--- End diff --

I'd like to pass in a `serializerRelocatable: Boolean` instead of a 
serializer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20659
  
**[Test build #87613 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87613/testReport)**
 for PR 20659 at commit 
[`a5bb731`](https://github.com/apache/spark/commit/a5bb731985488892ef9bc8ec9bbcff2a218d0130).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1005/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20659
  
**[Test build #87613 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87613/testReport)**
 for PR 20659 at commit 
[`a5bb731`](https://github.com/apache/spark/commit/a5bb731985488892ef9bc8ec9bbcff2a218d0130).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87613/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20653: [SPARK-23459][SQL] Improve the error message when unknow...

2018-02-22 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20653
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20653: [SPARK-23459][SQL] Improve the error message when unknow...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20653
  
**[Test build #87614 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87614/testReport)**
 for PR 20653 at commit 
[`b5c8d56`](https://github.com/apache/spark/commit/b5c8d562434354ed6e68f41b32c79eda60ae6d07).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20653: [SPARK-23459][SQL] Improve the error message when unknow...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20653
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20653: [SPARK-23459][SQL] Improve the error message when unknow...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20653
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1006/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20651#discussion_r169919844
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllStagesPage.scala 
---
@@ -172,6 +185,20 @@ private[ui] class AllStagesPage(parent: StagesTab) 
extends WebUIPage("") {
   {completedStagesTable.toNodeSeq}
 
 }
+if (shouldShowSkippedStages) {
+  content ++=
+

[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20651
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20651
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1007/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20651
  
**[Test build #87615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87615/testReport)**
 for PR 20651 at commit 
[`e04bf24`](https://github.com/apache/spark/commit/e04bf24ed0a712aa380de8dd929c7c1d4fdb2990).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20659
  
**[Test build #87616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87616/testReport)**
 for PR 20659 at commit 
[`80fd8a8`](https://github.com/apache/spark/commit/80fd8a8aa3c3e42cd99f164f80cfcc6f46e2f247).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1008/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-22 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20648
  
From the codes, looks like there is an intention to have partial results 
when failing to parse the documents. This patch makes the partial results. But 
this should be considered as behavior change, and we should discuss if this is 
acceptable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20653: [SPARK-23459][SQL] Improve the error message when unknow...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20653
  
**[Test build #87614 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87614/testReport)**
 for PR 20653 at commit 
[`b5c8d56`](https://github.com/apache/spark/commit/b5c8d562434354ed6e68f41b32c79eda60ae6d07).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20653: [SPARK-23459][SQL] Improve the error message when unknow...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20653
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87614/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20653: [SPARK-23459][SQL] Improve the error message when unknow...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20653
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20659
  
**[Test build #87616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87616/testReport)**
 for PR 20659 at commit 
[`80fd8a8`](https://github.com/apache/spark/commit/80fd8a8aa3c3e42cd99f164f80cfcc6f46e2f247).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87616/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20651
  
**[Test build #87615 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87615/testReport)**
 for PR 20651 at commit 
[`e04bf24`](https://github.com/apache/spark/commit/e04bf24ed0a712aa380de8dd929c7c1d4fdb2990).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20651
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87615/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20651
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20648
  
I was just double checking the current status for both CSV and JSON:

Seems CSV fills up the partial results with an exception (which is caught 
by permissive mode with the corrupt record text) when the length of schema is 
mismatched whereas JSON just allows this case without the exception (and also 
without the corrupt record text).

and ... both CSV and JSON don't fill the partial results when the type of 
the field is mismatched but this change supports this case in JSON.

cc @cloud-fan too. I remember we had a short talk about this partial 
results before. Do you think we should produce partial results in both CSV and 
JSON for mismatched types?

Looks we are currently not doing this for both CSV and JSON when the types 
are mismatched, if I haven't missed something.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20651
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20651
  
**[Test build #87617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87617/testReport)**
 for PR 20651 at commit 
[`e04bf24`](https://github.com/apache/spark/commit/e04bf24ed0a712aa380de8dd929c7c1d4fdb2990).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20651
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1009/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20651
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19788: [SPARK-9853][Core] Optimize shuffle fetch of cont...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19788#discussion_r169967506
  
--- Diff: project/MimaExcludes.scala ---
@@ -1129,6 +1129,12 @@ object MimaExcludes {
   
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasLoss.org$apache$spark$ml$param$shared$HasLoss$_setter_$loss_="),
   
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasLoss.getLoss"),
   
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasLoss.loss")
+) ++ Seq(
+  // [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition 
IDs.
+  
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.FetchFailed$"),
--- End diff --

yup


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/20659


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...

2018-02-22 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20618
  
Was the result to add more changes to this PR or add them in another PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20652: [SPARK-23476][CORE] Generate secret in local mode when a...

2018-02-22 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20652
  
lgtm


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r169983494
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java
 ---
@@ -58,7 +58,8 @@ public MemoryBlock allocate(long size) throws 
OutOfMemoryError {
 final long[] array = arrayReference.get();
 if (array != null) {
   assert (array.length * 8L >= size);
-  MemoryBlock memory = new MemoryBlock(array, 
Platform.LONG_ARRAY_OFFSET, size);
+  MemoryBlock memory =
+new OnHeapMemoryBlock(array, Platform.LONG_ARRAY_OFFSET, 
size);
--- End diff --

`OnHeapMemoryBlock.fromArray`? we should hide `Platform.LONG_ARRAY_OFFSET` 
from the caller side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r169983548
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java
 ---
@@ -70,7 +71,7 @@ public MemoryBlock allocate(long size) throws 
OutOfMemoryError {
   }
 }
 long[] array = new long[numWords];
-MemoryBlock memory = new MemoryBlock(array, 
Platform.LONG_ARRAY_OFFSET, size);
+MemoryBlock memory = new OnHeapMemoryBlock(array, 
Platform.LONG_ARRAY_OFFSET, size);
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r169983885
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -22,10 +22,9 @@
 import org.apache.spark.unsafe.Platform;
 
 /**
- * A consecutive block of memory, starting at a {@link MemoryLocation} 
with a fixed size.
+ * A declaration of interfaces of MemoryBlock classes .
--- End diff --

`A representation of a consecutive memory block in Spark. It defines the 
common interfaces for memory accessing and mutating.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r169984899
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +44,149 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Instantiate the same type of MemoryBlock with new offset and size
+   */
+  public abstract MemoryBlock allocate(long offset, long size);
+
+
+  public abstract int getInt(long offset);
--- End diff --

What should we expect the offset to be? Ideally I think we should assume 
the beginning offset of this memory block is 0, and concrete implementations 
should adjust by adding `Platform.BYTE_ARRAY_OFFSET` or something accordingly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r169985377
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +44,149 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Instantiate the same type of MemoryBlock with new offset and size
+   */
+  public abstract MemoryBlock allocate(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
+
+  public abstract void putDouble(long offset, double value);
+
+  public abstract Object getObjectVolatile(long offset);
--- End diff --

do we really need this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r169987267
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +44,149 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Instantiate the same type of MemoryBlock with new offset and size
+   */
+  public abstract MemoryBlock allocate(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
+
+  public abstract void putDouble(long offset, double value);
+
+  public abstract Object getObjectVolatile(long offset);
+
+  public abstract void putObjectVolatile(long offset, Object value);
+
+  public static void copyMemory(
+  MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, 
long length) {
+Platform.copyMemory(src.getBaseObject(), srcOffset, 
dst.getBaseObject(), dstOffset, length);
+  }
+
+  public abstract void copyFrom(byte[] src, long srcOffset, long 
dstOffset, long length);
--- End diff --

2 thoughts:
1. How much benefit can we get by making this method abstract? Is it as 
significant as other methods like 

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r169987802
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +44,149 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Instantiate the same type of MemoryBlock with new offset and size
+   */
+  public abstract MemoryBlock allocate(long offset, long size);
+
+
+  public abstract int getInt(long offset);
--- End diff --

Looks like now we ask the caller side to add `MemoryBlock.baseOffset` 
themselves, I think we should not leave this to the caller side. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20647#discussion_r169990652
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -107,17 +106,24 @@ case class DataSourceV2Relation(
 }
 
 /**
- * A specialization of DataSourceV2Relation with the streaming bit set to 
true. Otherwise identical
- * to the non-streaming relation.
+ * A specialization of [[DataSourceV2Relation]] with the streaming bit set 
to true.
+ *
+ * Note that, this plan has a mutable reader, so Spark won't apply 
operator push-down for this plan,
+ * to avoid making the plan mutable. We should consolidate this plan and 
[[DataSourceV2Relation]]
+ * after we figure out how to apply operator push-down for streaming data 
sources.
--- End diff --

We can also ask the implementations to clear out all the state after 
`createX` is called, then we don't need to add any new APIs. Anyway there 
should be a detailed design doc for data source v2 operator push down, it 
doesn't block this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-22 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r169995800
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +44,149 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Instantiate the same type of MemoryBlock with new offset and size
+   */
+  public abstract MemoryBlock allocate(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
+
+  public abstract void putDouble(long offset, double value);
+
+  public abstract Object getObjectVolatile(long offset);
--- End diff --

Since Spark doe not use `xxxObjectVolatile`, we can remove them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20652: [SPARK-23476][CORE] Generate secret in local mode...

2018-02-22 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20652#discussion_r169996395
  
--- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala 
---
@@ -440,23 +440,41 @@ class SecurityManagerSuite extends SparkFunSuite with 
ResetSystemProperties {
 assert(keyFromEnv === new SecurityManager(conf2).getSecretKey())
   }
 
-  test("secret key generation in yarn mode") {
-val conf = new SparkConf()
-  .set(NETWORK_AUTH_ENABLED, true)
-  .set(SparkLauncher.SPARK_MASTER, "yarn")
-val mgr = new SecurityManager(conf)
-
-UserGroupInformation.createUserForTesting("authTest", Array()).doAs(
-  new PrivilegedExceptionAction[Unit]() {
-override def run(): Unit = {
-  mgr.initializeAuth()
-  val creds = 
UserGroupInformation.getCurrentUser().getCredentials()
-  val secret = 
creds.getSecretKey(SecurityManager.SECRET_LOOKUP_KEY)
-  assert(secret != null)
-  assert(new String(secret, UTF_8) === mgr.getSecretKey())
+  test("secret key generation") {
+Seq(
+  ("yarn", true),
+  ("local", true),
+  ("local[*]", true),
+  ("local[1,2]", true),
--- End diff --

nit: `local[1,2]` -> `local[1, 2]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20660: [SPARK-23490][SQL]Check storage.locationUri with ...

2018-02-22 Thread gengliangwang
GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/20660

[SPARK-23490][SQL]Check storage.locationUri with existing table in 
CreateTable

## What changes were proposed in this pull request?

For CreateTable with Append mode, we should check if `storage.locationUri` 
is the same with existing table in `PreprocessTableCreation`

In the current code, there is only a simple exception if the 
`storage.locationUri` is different with existing table:
`org.apache.spark.sql.AnalysisException: Table or view not found:`

which can be improved.



## How was this patch tested?

Unit test



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark locationUri

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20660.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20660


commit d72f56d2cb4a4a3ba56f1013d0ddd21b145d299c
Author: Wang Gengliang 
Date:   2018-02-22T15:35:28Z

Check storage.locationUri with existing table in CreateTable




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20660: [SPARK-23490][SQL]Check storage.locationUri with existin...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20660
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1010/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20660: [SPARK-23490][SQL]Check storage.locationUri with existin...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20660
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20660: [SPARK-23490][SQL]Check storage.locationUri with existin...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20660
  
**[Test build #87618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87618/testReport)**
 for PR 20660 at commit 
[`d72f56d`](https://github.com/apache/spark/commit/d72f56d2cb4a4a3ba56f1013d0ddd21b145d299c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20645: SPARK-23472: Add defaultJavaOptions for drivers and exec...

2018-02-22 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20645
  
I agree it would be nicer to have this be a more general feature.  I would 
prefer an approach which didn't require a different configuration name, just as 
its more to document & for users to keep track of.  An alternative would be to 
have another syntax for appending to an options, perhaps ++= like scala , eg. 
"--conf spark.executor.extraJavaOptions++=..."

You'd still need to tag the ConfigBuilder with what separator to use to 
merge the values together.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20652: [SPARK-23476][CORE] Generate secret in local mode...

2018-02-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20652#discussion_r170003945
  
--- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala 
---
@@ -440,23 +440,41 @@ class SecurityManagerSuite extends SparkFunSuite with 
ResetSystemProperties {
 assert(keyFromEnv === new SecurityManager(conf2).getSecretKey())
   }
 
-  test("secret key generation in yarn mode") {
-val conf = new SparkConf()
-  .set(NETWORK_AUTH_ENABLED, true)
-  .set(SparkLauncher.SPARK_MASTER, "yarn")
-val mgr = new SecurityManager(conf)
-
-UserGroupInformation.createUserForTesting("authTest", Array()).doAs(
-  new PrivilegedExceptionAction[Unit]() {
-override def run(): Unit = {
-  mgr.initializeAuth()
-  val creds = 
UserGroupInformation.getCurrentUser().getCredentials()
-  val secret = 
creds.getSecretKey(SecurityManager.SECRET_LOOKUP_KEY)
-  assert(secret != null)
-  assert(new String(secret, UTF_8) === mgr.getSecretKey())
+  test("secret key generation") {
+Seq(
+  ("yarn", true),
+  ("local", true),
+  ("local[*]", true),
+  ("local[1,2]", true),
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20652: [SPARK-23476][CORE] Generate secret in local mode when a...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20652
  
**[Test build #87619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87619/testReport)**
 for PR 20652 at commit 
[`e01feef`](https://github.com/apache/spark/commit/e01feeff30ed226d638055fed0238ef0e1c0c69e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20658: [SPARK-23488][python] Add missing catalog methods...

2018-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20658#discussion_r170003647
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -137,6 +138,78 @@ def listColumns(self, tableName, dbName=None):
 isBucket=jcolumn.isBucket()))
 return columns
 
+@ignore_unicode_prefix
+# TODO: @since() decorator?
--- End diff --

I think `@since(2.4)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20658: [SPARK-23488][python] Add missing catalog methods...

2018-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20658#discussion_r170003442
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -28,7 +28,7 @@
 Database = namedtuple("Database", "name description locationUri")
 Table = namedtuple("Table", "name database description tableType 
isTemporary")
 Column = namedtuple("Column", "name description dataType nullable 
isPartition isBucket")
-Function = namedtuple("Function", "name description className isTemporary")
+Function = namedtuple("Function", "name database description className 
isTemporary")
--- End diff --

Hm, wouldn't this break backward compatibility?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20653: [SPARK-23459][SQL] Improve the error message when...

2018-02-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20653#discussion_r170025695
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
 ---
@@ -491,6 +491,22 @@ class FileSourceStrategySuite extends QueryTest with 
SharedSQLContext with Predi
 }
   }
 
+  test("SPARK-23459: Improve error message when specfieed unknown column 
in partition columns") {
--- End diff --

move it to SaveLoadSuite?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20624#discussion_r170031873
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -387,6 +390,101 @@ case class CatalogStatistics(
   }
 }
 
+/**
+ * This class of statistics for a column is used in [[CatalogTable]] to 
interact with metastore.
+ */
+case class CatalogColumnStat(
+  distinctCount: Option[BigInt] = None,
+  min: Option[String] = None,
+  max: Option[String] = None,
+  nullCount: Option[BigInt] = None,
+  avgLen: Option[Long] = None,
+  maxLen: Option[Long] = None,
+  histogram: Option[Histogram] = None) {
+
+  /**
+   * Returns a map from string to string that can be used to serialize the 
column stats.
+   * The key is the name of the column and name of the field (e.g. 
"colName.distinctCount"),
+   * and the value is the string representation for the value.
+   * min/max values are stored as Strings. They can be deserialized using
+   * [[ColumnStat.fromExternalString]].
+   *
+   * As part of the protocol, the returned map always contains a key 
called "version".
+   * In the case min/max values are null (None), they won't appear in the 
map.
+   */
+  def toMap(colName: String): Map[String, String] = {
+val map = new scala.collection.mutable.HashMap[String, String]
+map.put(s"${colName}.${CatalogColumnStat.KEY_VERSION}", "1")
+distinctCount.foreach { v =>
+  map.put(s"${colName}.${CatalogColumnStat.KEY_DISTINCT_COUNT}", 
v.toString)
+}
+nullCount.foreach { v =>
+  map.put(s"${colName}.${CatalogColumnStat.KEY_NULL_COUNT}", 
v.toString)
+}
+avgLen.foreach { v => 
map.put(s"${colName}.${CatalogColumnStat.KEY_AVG_LEN}", v.toString) }
+maxLen.foreach { v => 
map.put(s"${colName}.${CatalogColumnStat.KEY_MAX_LEN}", v.toString) }
+min.foreach { v => 
map.put(s"${colName}.${CatalogColumnStat.KEY_MIN_VALUE}", v) }
+max.foreach { v => 
map.put(s"${colName}.${CatalogColumnStat.KEY_MAX_VALUE}", v) }
+histogram.foreach { h =>
+  map.put(s"${colName}.${CatalogColumnStat.KEY_HISTOGRAM}", 
HistogramSerializer.serialize(h))
+}
+map.toMap
+  }
+
+  /** Convert [[CatalogColumnStat]] to [[ColumnStat]]. */
+  def toPlanStat(
+  colName: String,
+  dataType: DataType): ColumnStat =
+ColumnStat(
+  distinctCount = distinctCount,
+  min = min.map(ColumnStat.fromExternalString(_, colName, dataType)),
+  max = max.map(ColumnStat.fromExternalString(_, colName, dataType)),
+  nullCount = nullCount,
+  avgLen = avgLen,
+  maxLen = maxLen,
+  histogram = histogram)
+}
+
+object CatalogColumnStat extends Logging {
+
+  // List of string keys used to serialize CatalogColumnStat
+  val KEY_VERSION = "version"
+  private val KEY_DISTINCT_COUNT = "distinctCount"
+  private val KEY_MIN_VALUE = "min"
+  private val KEY_MAX_VALUE = "max"
+  private val KEY_NULL_COUNT = "nullCount"
+  private val KEY_AVG_LEN = "avgLen"
+  private val KEY_MAX_LEN = "maxLen"
+  private val KEY_HISTOGRAM = "histogram"
+
+  /**
+   * Creates a [[CatalogColumnStat]] object from the given map.
+   * This is used to deserialize column stats from some external storage.
+   * The serialization side is defined in [[CatalogColumnStat.toMap]].
+   */
+  def fromMap(
+table: String,
+colName: String,
+map: Map[String, String]): Option[CatalogColumnStat] = {
+
+try {
+  Some(CatalogColumnStat(
+distinctCount = map.get(s"${colName}.${KEY_DISTINCT_COUNT}").map(v 
=> BigInt(v.toLong)),
--- End diff --

Do we have migration issue here? Now, the key is changed. Can Spark 2.4 
read the catalog prop wrote by Spark 2.3?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20624#discussion_r170028784
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -387,6 +390,101 @@ case class CatalogStatistics(
   }
 }
 
+/**
+ * This class of statistics for a column is used in [[CatalogTable]] to 
interact with metastore.
+ */
+case class CatalogColumnStat(
+  distinctCount: Option[BigInt] = None,
+  min: Option[String] = None,
+  max: Option[String] = None,
+  nullCount: Option[BigInt] = None,
+  avgLen: Option[Long] = None,
+  maxLen: Option[Long] = None,
+  histogram: Option[Histogram] = None) {
--- End diff --

Nit: indents.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...

2018-02-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20362
  
ping @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20651
  
**[Test build #87617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87617/testReport)**
 for PR 20651 at commit 
[`e04bf24`](https://github.com/apache/spark/commit/e04bf24ed0a712aa380de8dd929c7c1d4fdb2990).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20651
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87617/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20651
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20362
  
**[Test build #4126 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4126/testReport)**
 for PR 20362 at commit 
[`acfb092`](https://github.com/apache/spark/commit/acfb0920d0c5ae4601580cec633f2c3b87a2673f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-22 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r170047180
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -693,7 +766,7 @@ class ALSSuite
 val data = ratings.toDF
 val model = new ALS().fit(data)
 Seq("nan", "NaN", "Nan", "drop", "DROP", "Drop").foreach { s =>
-  model.setColdStartStrategy(s).transform(data)
+  testTransformer[Rating[Int]](data, model.setColdStartStrategy(s), 
"prediction") { _ => }
--- End diff --

What's the no-op function at the end for -- just because it requires an 
argument?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-22 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r170046857
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -413,34 +411,36 @@ class ALSSuite
   .setSeed(0)
 val alpha = als.getAlpha
 val model = als.fit(training.toDF())
-val predictions = model.transform(test.toDF()).select("rating", 
"prediction").rdd.map {
-  case Row(rating: Float, prediction: Float) =>
-(rating.toDouble, prediction.toDouble)
+testTransformerByGlobalCheckFunc[Rating[Int]](test.toDF(), model, 
"rating", "prediction") {
+case rows: Seq[Row] =>
+  val predictions = rows.map(row => (row.getFloat(0).toDouble, 
row.getFloat(1).toDouble))
+
+  val rmse =
+if (implicitPrefs) {
+  // TODO: Use a better (rank-based?) evaluation metric for 
implicit feedback.
+  // We limit the ratings and the predictions to interval [0, 
1] and compute the
+  // weighted RMSE with the confidence scores as weights.
+  val (totalWeight, weightedSumSq) = predictions.map { case 
(rating, prediction) =>
+val confidence = 1.0 + alpha * math.abs(rating)
+val rating01 = math.max(math.min(rating, 1.0), 0.0)
+val prediction01 = math.max(math.min(prediction, 1.0), 0.0)
+val err = prediction01 - rating01
+(confidence, confidence * err * err)
+  }.reduce[(Double, Double)] { case ((c0, e0), (c1, e1)) =>
+(c0 + c1, e0 + e1)
+  }
+  math.sqrt(weightedSumSq / totalWeight)
+} else {
+  val errorSquares = predictions.map { case (rating, 
prediction) =>
+val err = rating - prediction
+err * err
+  }
+  val mse = errorSquares.sum / errorSquares.length
+  math.sqrt(mse)
+}
+  logInfo(s"Test RMSE is $rmse.")
+  assert(rmse < targetRMSE)
 }
-val rmse =
--- End diff --

This change is just a move, really? or did something else change as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-22 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r170046788
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -586,6 +586,68 @@ class ALSSuite
   allModelParamSettings, checkModelData)
   }
 
+  private def checkNumericTypesALS(
--- End diff --

Is this mostly just moving the code from the test class, or did it change 
too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-22 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20622
  
The difference in ContinuousExecution is that the thread isn't doing any 
metadata work like looking for new batches - it's either running the Spark job 
or cleaning up after finishing it.

In any case, this sounds like something we aren't plausibly going to solve 
here, so I'll update the PR to just resolve the flakiness in the way you 
suggest.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10942: [SPARK-12850] [SQL] Support Bucket Pruning (Predicate Pu...

2018-02-22 Thread lonehacker
Github user lonehacker commented on the issue:

https://github.com/apache/spark/pull/10942
  
@gatorsmile Can you confirm if this feature is active in current master? It 
seems like this code was removed in [SPARK-14535][SQL] Remove buildInternalScan 
from FileFormat


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20645: SPARK-23472: Add defaultJavaOptions for drivers and exec...

2018-02-22 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20645
  
I like the `ConfigBuilder` approach. That would make this much more useful. 
I'll add an implementation like that.

I think append option syntax would be confusing for users and 
administrators. We need to make sure it is not easy to overwrite the default 
options and we can't expect users to know they need to append, and how to do 
it. Administrators could add `++=` options, but then we would be applying these 
in the wrong order to get the behavior we're after, which is bad because order 
matters in some cases. I like the two-property approach better because it is 
simpler and more predictable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23491][SS] Remove explicit job cancellation from ...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20622
  
**[Test build #87620 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87620/testReport)**
 for PR 20622 at commit 
[`0e5e52f`](https://github.com/apache/spark/commit/0e5e52f2c6b934a372d098a0d7780da18d3f99e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20660: [SPARK-23490][SQL]Check storage.locationUri with existin...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20660
  
**[Test build #87618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87618/testReport)**
 for PR 20660 at commit 
[`d72f56d`](https://github.com/apache/spark/commit/d72f56d2cb4a4a3ba56f1013d0ddd21b145d299c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20660: [SPARK-23490][SQL]Check storage.locationUri with existin...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20660
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20660: [SPARK-23490][SQL]Check storage.locationUri with existin...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20660
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87618/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20651#discussion_r170057919
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllStagesPage.scala 
---
@@ -172,6 +185,20 @@ private[ui] class AllStagesPage(parent: StagesTab) 
extends WebUIPage("") {
   {completedStagesTable.toNodeSeq}
 
 }
+if (shouldShowSkippedStages) {
+  content ++=
+

[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-02-22 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20657
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-02-22 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20657
  
Known flaky (SPARK-23458).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20651
  
LGTM, merging to master / 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20651
  
Merged to master, but there were conflicts in 2.3; please open a separate 
PR for the backport.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20657
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1011/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20657
  
**[Test build #87621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87621/testReport)**
 for PR 20657 at commit 
[`2c3448d`](https://github.com/apache/spark/commit/2c3448dd3aa4071234a65a1c9317b1a3c4fe8d24).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20651: [SPARK-23475][UI] Show also skipped stages

2018-02-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20651


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20362
  
**[Test build #4126 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4126/testReport)**
 for PR 20362 at commit 
[`acfb092`](https://github.com/apache/spark/commit/acfb0920d0c5ae4601580cec633f2c3b87a2673f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-22 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r170068231
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +44,149 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Instantiate the same type of MemoryBlock with new offset and size
+   */
+  public abstract MemoryBlock allocate(long offset, long size);
+
+
+  public abstract int getInt(long offset);
--- End diff --

Yeah, this is a part where we should clean up. This is derived from the 
conventional usage.
Let me think about this cleanup.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20652: [SPARK-23476][CORE] Generate secret in local mode when a...

2018-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20652
  
**[Test build #87619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87619/testReport)**
 for PR 20652 at commit 
[`e01feef`](https://github.com/apache/spark/commit/e01feeff30ed226d638055fed0238ef0e1c0c69e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20652: [SPARK-23476][CORE] Generate secret in local mode when a...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87619/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20652: [SPARK-23476][CORE] Generate secret in local mode when a...

2018-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20652
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >