[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-04-16 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/20704
  
@megaserg : if you are writing to GCS, Azure, algorithm 2 is fine. If S3 is 
the target, then it's only safe to use with a consistent store (Hadoop 3.0 
+S3Guard, Amazon Consistent EMR); you still take a major perf hit from that 
copy. The S3A committers in Hadoop 3.1 deliver that high performance commit 
semantics, and Netflix committers don't (directly) need a consistent store 
—though to chain together work you will.

BTW, how to verify that the v2 algorithm version is being opted for? : set 
the version = 3 and expect a stack trace from the version switch code. It's 
what I do to make sure that the FileOutputCommitter isn't actually being picked 
up.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-04-13 Thread megaserg
Github user megaserg commented on the issue:

https://github.com/apache/spark/pull/20704
  
Thank you @dongjoon-hyun! This was also affecting our Spark job performance!

We're using `mapreduce.fileoutputcommitter.algorithm.version=2` in our 
Spark job config, as recommended e.g. here: 
http://spark.apache.org/docs/latest/cloud-integration.html. We're using 
user-provided Hadoop 2.9.0.

However, since this 2.6.5 JAR was in spark/jars, it was given priority in 
the classpath over Hadoop-distributed 2.9.0 JAR. The 2.6.5 was silently 
ignoring the `mapreduce.fileoutputcommitter.algorithm.version` setting and used 
the default, slow algorithm (I believe hadoop-mapreduce-client-core only had 
one, slow, algorithm until 2.7.0).

I believe this affects everyone who uses any mapreduce settings with Spark 
2.3.0. Great job!

Can we double-check that this JAR is not present in the "without-hadoop" 
Spark distribution anymore?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-02 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/20704
  
kicks in downstream depending on the order of imports; maven is 
closest-first in the graph. If you explicitly add hadoop-client in your deps at 
the top then everything gets reconciled consistently


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20704
  
Yeah, I'm just wondering why that didn't happen in the dependency:tree 
output in your description. Anyway, not really important to figure that out. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20704
  
Thank you for review and merging, @vanzin .

We generated both `spark-deps-hadoop-2.6` and `spark-deps-hadoop-2.7` with 
the following.
```
./dev/test-dependencies.sh --replace-manifest
```
`sbt` and `maven` choose the latest artifacts during the full build. So, 
this issue doesn't affect Apache Spark distribution.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20704
  
Hmm, I guess it was just luck that this didn't trigger the deps check, 
since that jar is checked for a specific version (2.7.3 in the case of 
hadoop2.7).

LGTM, merging to master / 2.3. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20704
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87854/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20704
  
**[Test build #87854 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87854/testReport)**
 for PR 20704 at commit 
[`dbb5ae5`](https://github.com/apache/spark/commit/dbb5ae504786ede4a336faa4033809a63ec10f92).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20704
  
Thank you for review, @jerryshao !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20704
  
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20704
  
**[Test build #87854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87854/testReport)**
 for PR 20704 at commit 
[`dbb5ae5`](https://github.com/apache/spark/commit/dbb5ae504786ede4a336faa4033809a63ec10f92).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1205/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20704
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20704
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20704
  
The failure is due to flaky test.
```
 org.apache.spark.sql.execution.streaming.RateSourceV2Suite.basic 
microbatch execution
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20704
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87848/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20704
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20704
  
**[Test build #87848 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87848/testReport)**
 for PR 20704 at commit 
[`dbb5ae5`](https://github.com/apache/spark/commit/dbb5ae504786ede4a336faa4033809a63ec10f92).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1199/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20704
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20704: [SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-co...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20704
  
**[Test build #87848 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87848/testReport)**
 for PR 20704 at commit 
[`dbb5ae5`](https://github.com/apache/spark/commit/dbb5ae504786ede4a336faa4033809a63ec10f92).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org