[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22288
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22318
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22318
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95723/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22318
  
**[Test build #95723 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95723/testReport)**
 for PR 22318 at commit 
[`809b8a8`](https://github.com/apache/spark/commit/809b8a83b7ec3d62ba6d65f6aff6a7d3175bd3e3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22288
  
**[Test build #95729 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95729/testReport)**
 for PR 22288 at commit 
[`87c4e57`](https://github.com/apache/spark/commit/87c4e57bb966078c8a78eabc5a5e4b6f60c78f28).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22328
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22328
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95727/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95721/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22328
  
**[Test build #95727 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95727/testReport)**
 for PR 22328 at commit 
[`218ce4c`](https://github.com/apache/spark/commit/218ce4cf796308c8705a27889b25100e2b779365).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22338
  
**[Test build #95721 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95721/testReport)**
 for PR 22338 at commit 
[`6cb94ed`](https://github.com/apache/spark/commit/6cb94ed5ed76e23e4fb775a2fd6f0e66e9c15abd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22340
  
Thank you for approval, @srowen .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ig...

2018-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22336


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22336
  
Wow, Great. Thanks, @gatorsmile ! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22335: [SPARK-25091][SQL] reduce the storage memory in Executor...

2018-09-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22335
  
Also, this should be `[core]` not `[sql]`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22336
  
Sorry,,, I just merged it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22336
  
Finally, it's passed. Can I merge this, @cloud-fan and @HyukjinKwon ? :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22336
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95722/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22336
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22336
  
**[Test build #95722 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95722/testReport)**
 for PR 22336 at commit 
[`69f207f`](https://github.com/apache/spark/commit/69f207f8a4531435c4a8df790780557968a33bb1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-05 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/22320#discussion_r215376132
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -82,7 +83,7 @@ case class CreateHiveTableAsSelectCommand(
   query,
   overwrite = true,
   ifPartitionNotExists = false,
-  outputColumns = outputColumns).run(sparkSession, child)
+  outputColumnNames = outputColumnNames).run(sparkSession, child)
--- End diff --

`outputColumnNames` themselves. Specyfing `outputColumnNames` as the name 
of the property to set using `outputColumnNames` does nothing but introduces a 
duplication. If you removed one `outputColumnNames` the comprehension should 
not be lowered whatsoever, shouldn't it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-09-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22112
  
The current solution looks good to me for unblocking the Apache 2.4 
release. We definitely should continue improving the fix, as what the other 
reviewers suggested above.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r215371723
  
--- Diff: core/src/main/java/org/apache/spark/ExecutorPlugin.java ---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+import org.apache.spark.annotation.DeveloperApi;
+
+/**
+ * A plugin which can be automaticaly instantiated within each Spark 
executor.  Users can specify
+ * plugins which should be created with the "spark.executor.plugins" 
configuration.  An instance
+ * of each plugin will be created for every executor, including those 
created by dynamic allocation,
+ * before the executor starts running any tasks.
+ *
+ * The specific api exposed to the end users still considered to be very 
unstable.  We will
+ * hopefully be able to keep compatability by providing default 
implementations for any methods
+ * added, but make no guarantees this will always be possible across all 
Spark releases.
+ *
+ * Spark does nothing to verify the plugin is doing legitimate things, or 
to manage the resources
+ * it uses.  A plugin acquires the same privileges as the user running the 
task.  A bad plugin
+ * could also intefere with task execution and make the executor fail in 
unexpected ways.
+ */
+@DeveloperApi
+public interface ExecutorPlugin {
+
+  /**
+   * Initialization method that will be called during executor startup, in 
the same thread as
+   * the executor. Plugins should override this method to add in their 
initialization logic.
+   */
+  default void init() {}
--- End diff --

I think it should be either `init` & `shutdown`, or `start` & `stop`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r215372559
  
--- Diff: core/src/main/java/org/apache/spark/ExecutorPlugin.java ---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+import org.apache.spark.annotation.DeveloperApi;
+
+/**
+ * A plugin which can be automaticaly instantiated within each Spark 
executor.  Users can specify
+ * plugins which should be created with the "spark.executor.plugins" 
configuration.  An instance
+ * of each plugin will be created for every executor, including those 
created by dynamic allocation,
+ * before the executor starts running any tasks.
+ *
+ * The specific api exposed to the end users still considered to be very 
unstable.  We will
+ * hopefully be able to keep compatability by providing default 
implementations for any methods
+ * added, but make no guarantees this will always be possible across all 
Spark releases.
+ *
+ * Spark does nothing to verify the plugin is doing legitimate things, or 
to manage the resources
+ * it uses.  A plugin acquires the same privileges as the user running the 
task.  A bad plugin
+ * could also intefere with task execution and make the executor fail in 
unexpected ways.
+ */
+@DeveloperApi
+public interface ExecutorPlugin {
+
+  /**
+   * Initialization method that will be called during executor startup, in 
the same thread as
--- End diff --

Since this is a new public API and thus this is public documentation, 
follow javadoc rules:

- first sentence should be short
- break things into paragraphs as needed (separated with ``).

e.g. 

```
Callback for initializing an executor plugin. This is called synchronously 
from the executor initialization thread.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r215373277
  
--- Diff: core/src/main/java/org/apache/spark/ExecutorPlugin.java ---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+import org.apache.spark.annotation.DeveloperApi;
+
+/**
+ * A plugin which can be automaticaly instantiated within each Spark 
executor.  Users can specify
+ * plugins which should be created with the "spark.executor.plugins" 
configuration.  An instance
+ * of each plugin will be created for every executor, including those 
created by dynamic allocation,
+ * before the executor starts running any tasks.
+ *
+ * The specific api exposed to the end users still considered to be very 
unstable.  We will
+ * hopefully be able to keep compatability by providing default 
implementations for any methods
+ * added, but make no guarantees this will always be possible across all 
Spark releases.
+ *
+ * Spark does nothing to verify the plugin is doing legitimate things, or 
to manage the resources
+ * it uses.  A plugin acquires the same privileges as the user running the 
task.  A bad plugin
+ * could also intefere with task execution and make the executor fail in 
unexpected ways.
+ */
+@DeveloperApi
+public interface ExecutorPlugin {
+
+  /**
+   * Initialization method that will be called during executor startup, in 
the same thread as
+   * the executor. Plugins should override this method to add in their 
initialization logic.
+   */
+  default void init() {}
+
+  /**
+   * Stop method, to be called when the executor is shutting down. Plugins 
should clean up
--- End diff --

Prefer:

```
Callback for cleaning up executor plugin resources. Called when the 
executor is shutting down.
```

"Prepare to terminate" is also a little vague; what does that mean? 
Shouldn't the plugin be terminated already when this method returns? Otherwise 
there are no guarantees of what happens, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22325#discussion_r215368719
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -444,10 +444,23 @@ final class ShuffleBlockFetcherIterator(
   throwFetchFailedException(blockId, address, e)
   }
 
-  input = streamWrapper(blockId, in)
+  var wrapCorruption: Boolean = false
--- End diff --

Wouldn't it be better to try to merge this block into the existing code 
below?

Something along the lines of:

```
try {
  val input = streamWrapper(blockId, in)
  if (detectCorrupt && !wrapCorruption && !input.eq(in) && size < 
maxBytesInFlight / 3) {
// do the corruption detection
} catch {
  // existing catch cases from around L474
}
```

It seems you're trying to achieve the same things as the existing code 
(e.g. retry on IOException) so it would be nice to re-use the same code for 
that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21404: [SPARK-24360][SQL] Support Hive 3.0 metastore

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21404
  
Thank you for review, @tooptoop4 and @wangyum .
I'm going to update this to the latest Hive 3.1.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22333
  
**[Test build #95728 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95728/testReport)**
 for PR 22333 at commit 
[`14d1017`](https://github.com/apache/spark/commit/14d10172f9d62ba9a8eabec5aaa11759757278bf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22333
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22333
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2882/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it'...

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22333#discussion_r215365678
  
--- Diff: build/mvn ---
@@ -91,15 +92,23 @@ install_mvn() {
 
 # Install zinc under the build/ folder
 install_zinc() {
-  local zinc_path="zinc-0.3.15/bin/zinc"
-  [ ! -f "${_DIR}/${zinc_path}" ] && ZINC_INSTALL_FLAG=1
-  local TYPESAFE_MIRROR=${TYPESAFE_MIRROR:-https://downloads.lightbend.com}
+  ZINC_VERSION=0.3.15
--- End diff --

Oh, thanks! Sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-09-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22112
  
@tgravescs thanks for testing it out! I've created 
https://issues.apache.org/jira/browse/SPARK-25341 and 
https://issues.apache.org/jira/browse/SPARK-25342 to track the followup.

I think these two together is the long-term solution. Users can do 
sort/checkpoint to eliminate the indeterminacy, or use a reliable shuffle 
storage to avoid fetch failure(someone is proposing it in dev list). If users 
can't avoid it and hit the issue, this PR provides a final guard to rerun some 
stages and get correct result. For Spark 2.4 we just fail the job, and we will 
finish the above 2 tickets in Spark 3.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22340
  
- Since `runSparkSubmit` is a part of `SparkSubmitTestUtils`, some test 
case will fail if we always remove the env variables like 
`SPARK_DIST_CLASSPATH`. It gives the child the correct test-time class-path.

- For executing old Spark distribution, yes. `SPARK_DIST_CLASSPATH` was the 
direct root cause and `SPARK_PREPEND_CLASSES` also doesn't look safe to me. I 
added the others in order to be proactively preventive.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-09-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22209
  
(Feel free to do it if you think it should be there, btw.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-09-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22209
  
I guess it should be ok. Just trying to be conservative and not 
inadvertently making the branch less stable in the middle of an rc cycle...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22318
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95720/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22318
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22335: [SPARK-25091][SQL] reduce the storage memory in E...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22335#discussion_r215359589
  
--- Diff: core/src/main/scala/org/apache/spark/status/LiveEntity.scala ---
@@ -509,6 +507,10 @@ private class LiveRDD(val info: RDDInfo) extends 
LiveEntity {
 
   private val distributions = new HashMap[String, LiveRDDDistribution]()
 
+  def isEmpty(): Boolean = {
+memoryUsed == 0L && diskUsed == 0L && partitions.isEmpty && 
distributions.isEmpty
--- End diff --

Shouldn't this be the equivalent of `storageLevel == None` instead? That's 
what 2.2 does:


https://github.com/apache/spark/blob/branch-2.2/core/src/main/scala/org/apache/spark/storage/StorageStatusListener.scala#L52



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22335: [SPARK-25091][SQL] reduce the storage memory in E...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22335#discussion_r215360659
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -710,9 +719,21 @@ private[spark] class AppStatusListener(
 val executorId = event.blockUpdatedInfo.blockManagerId.executorId
 
 // Whether values are being added to or removed from the existing 
accounting.
+// BlockManager always send empty block status message when user try 
to remove rdd block,
+// so we try to get this removed block size from rdd partition to get 
accurate memory/disk storage size.
 val storageLevel = event.blockUpdatedInfo.storageLevel
-val diskDelta = event.blockUpdatedInfo.diskSize * (if 
(storageLevel.useDisk) 1 else -1)
-val memoryDelta = event.blockUpdatedInfo.memSize * (if 
(storageLevel.useMemory) 1 else -1)
+val diskDelta: Long = storageLevel != StorageLevel.NONE match {
--- End diff --

A simple `if` is much clearer (and probably faster) here. But see later 
comment. I believe it's better for this code to be in the "unpersist" callback, 
and leave the block manager code unchanged.

This code is also wrong the way it is. Because if you're changing the 
storage level of an RDD (e.g. if it was cached on disk, but after the update, 
it's not) then this is doing the wrong thing now.

(So, another argument for treating the "unpersist" event differently, and 
in the appropriate callback.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22335: [SPARK-25091][SQL] reduce the storage memory in E...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22335#discussion_r215360036
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1549,7 +1549,7 @@ private[spark] class BlockManager(
 // TODO: Avoid a linear scan by creating another mapping of RDD.id to 
blocks.
 logInfo(s"Removing RDD $rddId")
 val blocksToRemove = 
blockInfoManager.entries.flatMap(_._1.asRDDId).filter(_.rddId == rddId)
-blocksToRemove.foreach { blockId => removeBlock(blockId, tellMaster = 
false) }
--- End diff --

I don't think this should be changing. Instead, the `onUnpersistRDD` 
callback in the listener should be updating the executor data it's tracking 
when the RDD is being unpersisted. That's similar to what I understand was 
being done in 2.2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22335: [SPARK-25091][SQL] reduce the storage memory in E...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22335#discussion_r215359111
  
--- Diff: core/src/main/scala/org/apache/spark/status/LiveEntity.scala ---
@@ -120,7 +118,7 @@ private class LiveTask(
 stageAttemptId: Int,
 lastUpdateTime: Option[Long]) extends LiveEntity {
 
-  import LiveEntityHelpers._
--- End diff --

None of these import changes are necessary...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22318
  
**[Test build #95720 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95720/testReport)**
 for PR 22318 at commit 
[`809b8a8`](https://github.com/apache/spark/commit/809b8a83b7ec3d62ba6d65f6aff6a7d3175bd3e3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-09-05 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/22209
  
any reason not to merge to 2.3?  its a bug in 2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22328
  
**[Test build #95727 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95727/testReport)**
 for PR 22328 at commit 
[`218ce4c`](https://github.com/apache/spark/commit/218ce4cf796308c8705a27889b25100e2b779365).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22328
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22335: [SPARK-25091][SQL] reduce the storage memory in E...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22335#discussion_r215358096
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -646,8 +646,17 @@ private[spark] class AppStatusListener(
   }
 
   override def onUnpersistRDD(event: SparkListenerUnpersistRDD): Unit = {
-liveRDDs.remove(event.rddId)
-kvstore.delete(classOf[RDDStorageInfoWrapper], event.rddId)
+while (true) {
--- End diff --

There's a single thread that calls this method (or any other listener 
method in this class). But I'm kinda wondering what's the point of the loop in 
the first place...

It seems you'll get into an infinite loop if for some reason 
`rdd.isEmpty()` returns false.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22328
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2881/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/22328
  
The image data source tests passed but JVM crashed at the end. Triggered 
another test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/22328
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22140: [SPARK-25072][PySpark] Forbid extra value for cus...

2018-09-05 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/22140#discussion_r215356501
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1397,6 +1397,8 @@ def _create_row_inbound_converter(dataType):
 
 
 def _create_row(fields, values):
+if len(values) > len(fields):
+raise ValueError("Can not create %s by %s" % (fields, values))
--- End diff --

I'd like to improve this message a little, maybe "Can not create Row with 
fields %s, expected %d values but got %s" % (fields, len(fields), values)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22140: [SPARK-25072][PySpark] Forbid extra value for cus...

2018-09-05 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/22140#discussion_r215355538
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -269,6 +269,10 @@ def test_struct_field_type_name(self):
 struct_field = StructField("a", IntegerType())
 self.assertRaises(TypeError, struct_field.typeName)
 
+def test_invalid_create_row(slef):
--- End diff --

typo: `slef` -> `self`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attibutes fr...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attibutes fr...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95718/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attibutes fr...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22326
  
**[Test build #95718 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95718/testReport)**
 for PR 22326 at commit 
[`777b881`](https://github.com/apache/spark/commit/777b881df5f577d2bb4100b96748692d494c54c9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22112
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95713/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22112
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22112
  
**[Test build #95713 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95713/testReport)**
 for PR 22112 at commit 
[`8952d08`](https://github.com/apache/spark/commit/8952d082b7b9082d38f5b332ccded2d2d7c96b08).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22325
  
**[Test build #95726 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95726/testReport)**
 for PR 22325 at commit 
[`ec069b3`](https://github.com/apache/spark/commit/ec069b3d269b6ec2e6ca9edcc0f6981ccecdad3d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

2018-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22209


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22325
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22329: [SPARK-25328][PYTHON] Add an example for having t...

2018-09-05 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/22329#discussion_r215345817
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2804,6 +2804,22 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
|  1|1.5|
|  2|6.0|
+---+---+
+   >>> @pandas_udf(
+   ..."id long, additional_key double, v double",
--- End diff --

Sorry, I know you just changed it, but I think just naming the column 
"ceil(v1 / 2)" with a type `long` would be a little more clear. Although 
"additional_key" is ok too, if you guys want to keep that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

2018-09-05 Thread ankuriitg
Github user ankuriitg commented on a diff in the pull request:

https://github.com/apache/spark/pull/22209#discussion_r215345106
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/UISeleniumSuite.scala ---
@@ -77,7 +77,14 @@ class UISeleniumSuite
 inputStream.foreachRDD { rdd =>
   rdd.foreach(_ => {})
   try {
-rdd.foreach(_ => throw new RuntimeException("Oops"))
+rdd.foreach { _ =>
+  import org.apache.spark.TaskContext
--- End diff --

Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-09-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22209
  
Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22209#discussion_r215344056
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/UISeleniumSuite.scala ---
@@ -77,7 +77,14 @@ class UISeleniumSuite
 inputStream.foreachRDD { rdd =>
   rdd.foreach(_ => {})
   try {
-rdd.foreach(_ => throw new RuntimeException("Oops"))
+rdd.foreach { _ =>
+  import org.apache.spark.TaskContext
--- End diff --

This should be together with other imports. Will fix during merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22209#discussion_r215343938
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
@@ -1190,6 +1190,61 @@ class AppStatusListenerSuite extends SparkFunSuite 
with BeforeAndAfter {
 assert(appStore.asOption(appStore.lastStageAttempt(3)) === None)
   }
 
+  test("SPARK-24415: update metrics for tasks that finish late") {
+val listener = new AppStatusListener(store, conf, true)
+
+val stage1 = new StageInfo(1, 0, "stage1", 4, Nil, Nil, "details1")
+val stage2 = new StageInfo(2, 0, "stage2", 4, Nil, Nil, "details2")
+
+// Start job
+listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage1, 
stage2), null))
+
+// Start 2 stages
+listener.onStageSubmitted(SparkListenerStageSubmitted(stage1, new 
Properties()))
+listener.onStageSubmitted(SparkListenerStageSubmitted(stage2, new 
Properties()))
+
+// Start 2 Tasks
+val tasks = createTasks(2, Array("1"))
+tasks.foreach { task =>
+  listener.onTaskStart(SparkListenerTaskStart(stage1.stageId, 
stage1.attemptNumber, task))
+}
+
+// Task 1 Finished
+time += 1
+tasks(0).markFinished(TaskState.FINISHED, time)
+listener.onTaskEnd(
+  SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType", 
Success, tasks(0), null))
+
+// Stage 1 Completed
+stage1.failureReason = Some("Failed")
+listener.onStageCompleted(SparkListenerStageCompleted(stage1))
+
+// Stop job 1
+time += 1
+listener.onJobEnd(SparkListenerJobEnd(1, time, JobSucceeded))
+
+// Task 2 Killed
+time += 1
+tasks(1).markFinished(TaskState.FINISHED, time)
+listener.onTaskEnd(
+  SparkListenerTaskEnd(stage1.stageId, stage1.attemptId, "taskType",
+TaskKilled(reason = "Killed"), tasks(1), null))
+
+// Ensure killed task metrics are updated
+val allStages = 
store.view(classOf[StageDataWrapper]).reverse().asScala.map(_.info)
+val failedStages = allStages.filter(_.status == v1.StageStatus.FAILED)
+assert(failedStages.size == 1)
+assert(failedStages.head.numKilledTasks == 1)
+assert(failedStages.head.numCompleteTasks == 1)
+
+val allJobs = 
store.view(classOf[JobDataWrapper]).reverse().asScala.map(_.info)
+assert(allJobs.size == 1)
--- End diff --

Ah, missed that. Took another look and it should be fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22327: [SPARK-25330][BUILD] Revert Hadoop 2.7 to 2.7.3

2018-09-05 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/22327
  
The goal for the 2.7.x line should be "nothing breaks", which is precisely 
why it's only getting critical patches. Reverting might make the problem go 
away, but you can assume that everyone running HDFS clusters with something 
based on the 2.7.x line will get this patch in before long: it's best to 
identify what's up and address it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22340
  
I see. It does seem like we don't want to run with test env variables in 
this context. I was going to ask if we ever do? should this function always 
strip the env variables for testing? I can see being conservative and 
restricting it to this case.

It seems like just stripping `SPARK_DIST_CLASSPATH` and maybe 
`SPARK_PREPEND_CLASSES` is the issue? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22328
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22328
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95724/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22328
  
**[Test build #95724 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95724/testReport)**
 for PR 22328 at commit 
[`218ce4c`](https://github.com/apache/spark/commit/218ce4cf796308c8705a27889b25100e2b779365).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22327: [SPARK-25330][BUILD] Revert Hadoop 2.7 to 2.7.3

2018-09-05 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22327
  
If it's an unintentional regression in Hadoop, and is impacting users, I'd 
agree with down-grading the version here (at least to the version that doesn't 
exhibit the issue).

If it were intentional behavior, we'd really need to work around it.

@steveloughran seems like you think it's a regression insofar as behavior 
changes like this aren't expected in maintenance releases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22262: [SPARK-25175][SQL] Field resolution should fail if there...

2018-09-05 Thread seancxmao
Github user seancxmao commented on the issue:

https://github.com/apache/spark/pull/22262
  
@dongjoon-hyun I have updated PR description to explain in more details. As 
you mentioned, this PR is specific to the case when reading from data source 
table persisted in metastore.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-05 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215338145
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -215,6 +215,19 @@ 
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai
 
 The configuration properties for mounting volumes into the executor pods 
use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. 
For a complete list of available options for each supported type of volumes, 
please refer to the [Spark Properties](#spark-properties) section below. 
 
+## Local Storage
+
+Spark uses temporary scratch space to spill data to disk during shuffles 
and other operations.  When using Kubernetes as the resource manager the pods 
will be created with an 
[emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) 
volume mounted for each directory listed in `SPARK_LOCAL_DIRS`.  If no 
directories are explicitly specified then a default directory is created and 
configured appropriately.
+
+`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do 
not persist beyond the life of the pod.
+
+### Using RAM for local storage
+
+As `emptyDir` volumes use the nodes backing storage for ephemeral storage 
this default behaviour may not be appropriate for some compute environments.  
For example if you have diskless nodes with remote storage mounted over a 
network having lots of executors doing IO to this remote storage may actually 
degrade performance.
+
+In this case it may be desirable to set 
`spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause 
the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes.  
When configured like this Sparks local storage usage will count towards your 
pods memory usage therefore you may wish to increase your memory requests via 
the normal `spark.driver.memory` and `spark.executor.memory` configuration 
properties.
--- End diff --

Ok, I'll update the proposed doc changes to mention both


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread suryag10
Github user suryag10 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215338063
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopBootstrapUtil.scala
 ---
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import com.google.common.base.Charsets
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model.{ConfigMap, ConfigMapBuilder, 
ContainerBuilder, KeyToPathBuilder, PodBuilder}
+
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+
+private[spark] object HadoopBootstrapUtil {
+
+   /**
+* Mounting the DT secret for both the Driver and the executors
+*
+* @param dtSecretName Name of the secret that stores the Delegation 
Token
+* @param dtSecretItemKey Name of the Item Key storing the Delegation 
Token
+* @param userName Name of the SparkUser to set SPARK_USER
+* @param fileLocation Location of the krb5 file
+* @param krb5ConfName Name of the ConfigMap for Krb5
+* @param pod Input pod to be appended to
+* @return a modified SparkPod
+*/
+  def bootstrapKerberosPod(
+  dtSecretName: String,
+  dtSecretItemKey: String,
+  userName: String,
+  fileLocation: String,
+  krb5ConfName: String,
+  pod: SparkPod) : SparkPod = {
+  val krb5File = new File(fileLocation)
+  val fileStringPath = krb5File.toPath.getFileName.toString
+  val kerberizedPod = new PodBuilder(pod.pod)
+.editOrNewSpec()
+  .addNewVolume()
+.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+.withNewSecret()
+  .withSecretName(dtSecretName)
+  .endSecret()
+.endVolume()
+  .addNewVolume()
+.withName(KRB_FILE_VOLUME)
+  .withNewConfigMap()
+.withName(krb5ConfName)
+.withItems(new KeyToPathBuilder()
+  .withKey(fileStringPath)
+  .withPath(fileStringPath)
+  .build())
+.endConfigMap()
+  .endVolume()
+// TODO: (ifilonenko) make configurable PU(G)ID
+  .editOrNewSecurityContext()
+.withRunAsUser(1000L)
+.withFsGroup(2000L)
+.endSecurityContext()
+  .endSpec()
+.build()
+  val kerberizedContainer = new ContainerBuilder(pod.container)
+.addNewVolumeMount()
+  .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+  .withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR)
+  .endVolumeMount()
+.addNewVolumeMount()
+  .withName(KRB_FILE_VOLUME)
+  .withMountPath(KRB_FILE_DIR_PATH)
--- End diff --

Hi Stavros,
The main aim of the following step was to mount the krb5.conf file into 
/etc/ directory.
.addNewVolumeMount()
  .withName(KRB_FILE_VOLUME)
  .withMountPath(KRB_FILE_DIR_PATH) 
when the above is done all the contents which are actually  present inside 
the container in "/etc/" directory are lost and gets mounted only with the 
krb5.conf file with read permissions. As other contents are lost from "/etc" 
directory nothing works. As i had commented earlier as well, this makes the 
driver pod to fail (spawn fails) as well. To make things correct and mount only 
the krb5.conf file,   following should be done:
.addNewVolumeMount()
   .withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf")
   .withSubPath("krb5.conf")

With this everything is perfect without any deletions from /etc/ directory

[GitHub] spark issue #20999: [WIP][SPARK-14922][SPARK-23866][SQL] Support partition f...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20999: [WIP][SPARK-14922][SPARK-23866][SQL] Support partition f...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2880/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20999: [WIP][SPARK-23866][SQL] Support partition filters in ALT...

2018-09-05 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20999
  
I am reopening this according to the discussion in 
https://github.com/apache/spark/pull/19691.

cc @maropu @DazhuangSu


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22327: [SPARK-25330][BUILD] Revert Hadoop 2.7 to 2.7.3

2018-09-05 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/22327
  
The 2.7.x branches updates are generally all security plus some fixes for 
JVM/JDK regressions.

without looking at the details, you can assume that the regression will be 
related to one of these and for that reason, wouldn't recommend rolling back. 
Better to find the problem and  come up with a fix or workaround.

Created [HADOOP-15722](https://issues.apache.org/jira/browse/HADOOP-15722) 
to cover this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20999: [WIP][SPARK-23866][SQL] Support partition filters in ALT...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20999
  
**[Test build #95725 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95725/testReport)**
 for PR 20999 at commit 
[`148f477`](https://github.com/apache/spark/commit/148f47742cae892260c46f9ffa97bb2d0422701d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20999: [WIP][SPARK-23866][SQL] Support partition filters...

2018-09-05 Thread mgaido91
GitHub user mgaido91 reopened a pull request:

https://github.com/apache/spark/pull/20999

[WIP][SPARK-23866][SQL] Support partition filters in ALTER TABLE DROP 
PARTITION

## What changes were proposed in this pull request?

Hive has been supporting for a while the ability of dropping partitions 
using any kind of comparison operator on them. Spark so far is supporting only 
dropping partitions by exact values. For instance, Spark doesn't support:

```
ALTER TABLE mytable DROP PARTITION(mydate < '2018-04-06')
```

The PR adds the support to this syntax too.

## How was this patch tested?

UTs to be added


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-23866

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20999.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20999


commit b57a5d1797dbe206aeb0a4d2a24ccd0c73845dc8
Author: Marco Gaido 
Date:   2018-04-05T15:35:19Z

[WIP][SPARK-23866][SQL] Support partition filters in ALTER TABLE DROP 
PARTITION

commit 148f47742cae892260c46f9ffa97bb2d0422701d
Author: Marco Gaido 
Date:   2018-09-05T16:11:38Z

adding UT from SPARK-14922 by DazhuangSu




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22340
  
Also, cc @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...

2018-09-05 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22141#discussion_r215333831
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/nested-not-in.sql
 ---
@@ -0,0 +1,198 @@
+-- Tests NOT-IN subqueries nested inside OR expression(s).
--- End diff --

@maropu Oh.. thank you very much for trying out.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22340
  
`HiveExternalCatalogVersionsSuite` downloads and executes old Spark in 
their directory. However, it gave the child the followings. None of them is 
expected here. Especially, `SPARK_DIST_CLASSPATH` is related to this bug.
- SPARK_TESTING
- SPARK_SQL_TESTING
- SPARK_PREPEND_CLASSES
- SPARK_DIST_CLASSPATH

Previously, in the class path, new Spark classes are behind the old Spark 
classes. So, new ones are unseen. However, Spark 2.4.0 reveals this bug due to 
the recent data source class changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95711/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22320
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/22328
  
LGTM pending tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22320
  
**[Test build #95711 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95711/testReport)**
 for PR 22320 at commit 
[`4590c98`](https://github.com/apache/spark/commit/4590c9837026e820d7d91300a7ab3f87a668755c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22328
  
**[Test build #95724 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95724/testReport)**
 for PR 22328 at commit 
[`218ce4c`](https://github.com/apache/spark/commit/218ce4cf796308c8705a27889b25100e2b779365).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22328
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2879/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22328
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...

2018-09-05 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22141
  
@maropu I think, this will be considered post 2.4. So we have some time. I 
am now exploring options to plan better at physical level. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...

2018-09-05 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/22165
  
I didn't make a full pass over the tests. @jiangxb1987 let me know if you 
need me to take a pass. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/22165#discussion_r215326727
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BarrierCoordinatorSuite.scala ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.TimeoutException
+
+import scala.concurrent.duration._
+import scala.language.postfixOps
+
+import org.apache.spark._
+import org.apache.spark.rpc.RpcTimeout
+
+class BarrierCoordinatorSuite extends SparkFunSuite with LocalSparkContext 
{
+
+  /**
+   * Get the current barrierEpoch from barrierCoordinator.states by 
ContextBarrierId
+   */
+  def getCurrentBarrierEpoch(
+  stageId: Int, stageAttemptId: Int, barrierCoordinator: 
BarrierCoordinator): Int = {
+val barrierId = ContextBarrierId(stageId, stageAttemptId)
+barrierCoordinator.states.get(barrierId).barrierEpoch
+  }
+
+  test("normal test for single task") {
+sc = new SparkContext("local", "test")
+val barrierCoordinator = new BarrierCoordinator(5, sc.listenerBus, 
sc.env.rpcEnv)
+val rpcEndpointRef = sc.env.rpcEnv.setupEndpoint("barrierCoordinator", 
barrierCoordinator)
+val stageId = 0
+val stageAttemptNumber = 0
+rpcEndpointRef.askSync[Unit](
+  message = RequestToSync(numTasks = 1, stageId, stageAttemptNumber, 
taskAttemptId = 0,
+barrierEpoch = 0),
+  timeout = new RpcTimeout(5 seconds, "rpcTimeOut"))
+// sleep for waiting barrierEpoch value change
+Thread.sleep(500)
+assert(getCurrentBarrierEpoch(stageId, stageAttemptNumber, 
barrierCoordinator) == 1)
+  }
+
+  test("normal test for multi tasks") {
+sc = new SparkContext("local", "test")
+val barrierCoordinator = new BarrierCoordinator(5, sc.listenerBus, 
sc.env.rpcEnv)
+val rpcEndpointRef = sc.env.rpcEnv.setupEndpoint("barrierCoordinator", 
barrierCoordinator)
+val numTasks = 3
+val stageId = 0
+val stageAttemptNumber = 0
+val rpcTimeOut = new RpcTimeout(5 seconds, "rpcTimeOut")
+// sync request from 3 tasks
+(0 until numTasks).foreach { taskId =>
+  new Thread(s"task-$taskId-thread") {
+setDaemon(true)
+override def run(): Unit = {
+  rpcEndpointRef.askSync[Unit](
+message = RequestToSync(numTasks, stageId, stageAttemptNumber, 
taskAttemptId = taskId,
+  barrierEpoch = 0),
+timeout = rpcTimeOut)
+}
+  }.start()
+}
+// sleep for waiting barrierEpoch value change
+Thread.sleep(500)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/22165#discussion_r215326394
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BarrierCoordinatorSuite.scala ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.TimeoutException
+
+import scala.concurrent.duration._
+import scala.language.postfixOps
+
+import org.apache.spark._
+import org.apache.spark.rpc.RpcTimeout
+
+class BarrierCoordinatorSuite extends SparkFunSuite with LocalSparkContext 
{
+
+  /**
+   * Get the current barrierEpoch from barrierCoordinator.states by 
ContextBarrierId
+   */
+  def getCurrentBarrierEpoch(
+  stageId: Int, stageAttemptId: Int, barrierCoordinator: 
BarrierCoordinator): Int = {
+val barrierId = ContextBarrierId(stageId, stageAttemptId)
+barrierCoordinator.states.get(barrierId).barrierEpoch
+  }
+
+  test("normal test for single task") {
+sc = new SparkContext("local", "test")
+val barrierCoordinator = new BarrierCoordinator(5, sc.listenerBus, 
sc.env.rpcEnv)
+val rpcEndpointRef = sc.env.rpcEnv.setupEndpoint("barrierCoordinator", 
barrierCoordinator)
+val stageId = 0
+val stageAttemptNumber = 0
+rpcEndpointRef.askSync[Unit](
+  message = RequestToSync(numTasks = 1, stageId, stageAttemptNumber, 
taskAttemptId = 0,
+barrierEpoch = 0),
+  timeout = new RpcTimeout(5 seconds, "rpcTimeOut"))
+// sleep for waiting barrierEpoch value change
+Thread.sleep(500)
--- End diff --

Do not use explicit sleep. It basically means adding 0.5 seconds to total 
test time and flakyness. Use conditional wait, for example: 
https://github.com/apache/spark/commit/bfb74394a5513134ea1da9fcf4a1783b77dd64e4#diff-a90010f459c27926238d7a4ce5a0aca1R107


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/22165#discussion_r215324595
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -65,7 +65,7 @@ private[spark] class BarrierCoordinator(
 
   // Record all active stage attempts that make barrier() call(s), and the 
corresponding internal
   // state.
-  private val states = new ConcurrentHashMap[ContextBarrierId, 
ContextBarrierState]
+  private[spark] val states = new ConcurrentHashMap[ContextBarrierId, 
ContextBarrierState]
--- End diff --

Could you turn the `// ...` comment into ScalaDoc `/** ... */` and mention 
`Visible for unit testing.` in the doc?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95712/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #95712 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95712/testReport)**
 for PR 18906 at commit 
[`97305f5`](https://github.com/apache/spark/commit/97305f53218d7d8ab6588ab2c60d1cb91c14e9dc).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/22328#discussion_r215322021
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala ---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source.image
+
+/**
+ * `image` package implements Spark SQL data source API for loading IMAGE 
data as `DataFrame`.
+ * The loaded `DataFrame` has one `StructType` column: `image`.
+ * The schema of the `image` column is:
+ *  - origin: String (represents the origin of the image.
+ *If loaded from files, then it is the file path)
+ *  - height: Int (height of the image)
+ *  - width: Int (width of the image)
+ *  - nChannels: Int (number of the image channels)
+ *  - mode: Int (OpenCV-compatible type)
+ *  - data: BinaryType (Image bytes in OpenCV-compatible order: row-wise 
BGR in most cases)
+ *
+ * To use IMAGE data source, you need to set "image" as the format in 
`DataFrameReader` and
+ * optionally specify the data source options, for example:
+ * {{{
+ *   // Scala
+ *   val df = spark.read.format("image")
+ * .option("dropImageFailures", true)
+ * .load("data/mllib/images/partitioned")
+ *
+ *   // Java
+ *   Dataset df = spark.read().format("image")
+ * .option("dropImageFailures", true)
+ * .load("data/mllib/images/partitioned");
+ * }}}
+ *
+ * IMAGE data source supports the following options:
+ *  - "dropImageFailures": Whether to drop the files that are not valid 
images from the result.
--- End diff --

How about changing `dropImageFailures` to `dropInvalid`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/22328#discussion_r215322673
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/source/image/ImageOptions.scala ---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source.image
+
+import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
+
+private[image] class ImageOptions(
+@transient private val parameters: CaseInsensitiveMap[String]) extends 
Serializable {
+
+  def this(parameters: Map[String, String]) = 
this(CaseInsensitiveMap(parameters))
+
+  val dropImageFailures = parameters.getOrElse("dropImageFailures", 
"false").toBoolean
--- End diff --

Should add ScalaDoc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >