[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-11-22 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/21066
  
+1

one thing to consider here is to be ruthless about when there are things in 
bits of the HDFS APIs/libraries which don't suit, and rather than think "how do 
we work around this", think "what do we need to do to get this fixed". 

This includes (base on the HBase & Hive experiences)
* what's marked stable
* serialization of classes
* pulling up of operations from HDFS to the public FileSystem API (source 
of some contention there between myself and the hdfs team as to what 
constitutes acceptable specification and tests)
* thread safety (HBase & encrypted IO)
* various constants in HDFS interfaces tagged as private.
etc.

BTW, I'm thinking of retiring the MRv1 commit APIs: initially marking as 
deprecated. I'd match that with something to pre-emptively move spark onto the 
V2 one. After all, it's all bridged internally.







---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-11-21 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21066
  
If we're considering only supporting Hadoop 3 in Spark 3 -- and I think we 
should -- this could even go into the main source tree.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-11-20 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/21066
  
The main barrier to this is the what-do-we-do-about-hive problem, as 
without it ASF Spark doesn't run against Hadoop 3.x

It looks like "support Hive 2" is the plan there, *which is the right thing 
to do long term*

short term, well, we're actually shipping this and the patched hive 1.2.x 
artifacts in HDP-3.0; qualifying through our own tests, etc. I'm happy with it.

It's also worth noting that there's work ongoing in Hadoop 3.2-3.3 to add 
multipart upload as an explicit API across filesystems, so you'll be able to 
write committers which can use multipart upload & commit across stores. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-11-19 Thread venkey-ariv
Github user venkey-ariv commented on the issue:

https://github.com/apache/spark/pull/21066
  
Are there any plans to merge this? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90333/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21066
  
**[Test build #90333 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90333/testReport)**
 for PR 21066 at commit 
[`3e1bce3`](https://github.com/apache/spark/commit/3e1bce3b9163de836681c69a2eff8e67108ac7b7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3010/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-05-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21066
  
**[Test build #90333 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90333/testReport)**
 for PR 21066 at commit 
[`3e1bce3`](https://github.com/apache/spark/commit/3e1bce3b9163de836681c69a2eff8e67108ac7b7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-05-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21066
  
cc @rxin @JoshRosen @zsxwing 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89856/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21066
  
**[Test build #89856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89856/testReport)**
 for PR 21066 at commit 
[`659a7a4`](https://github.com/apache/spark/commit/659a7a4378cf5afe539ef113faebb7a3f583b1ab).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BindingParquetOutputCommitter(`
  * `class PathOutputCommitProtocol(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21066
  
**[Test build #89856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89856/testReport)**
 for PR 21066 at commit 
[`659a7a4`](https://github.com/apache/spark/commit/659a7a4378cf5afe539ef113faebb7a3f583b1ab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2678/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21066
  
Hi, @mridulm . Could you review this PR please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21066
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89844/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21066
  
**[Test build #89844 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89844/testReport)**
 for PR 21066 at commit 
[`659a7a4`](https://github.com/apache/spark/commit/659a7a4378cf5afe539ef113faebb7a3f583b1ab).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BindingParquetOutputCommitter(`
  * `class PathOutputCommitProtocol(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2671/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21066
  
**[Test build #89844 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89844/testReport)**
 for PR 21066 at commit 
[`659a7a4`](https://github.com/apache/spark/commit/659a7a4378cf5afe539ef113faebb7a3f583b1ab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89346/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21066
  
**[Test build #89346 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89346/testReport)**
 for PR 21066 at commit 
[`9d02ae7`](https://github.com/apache/spark/commit/9d02ae731e0fe314da312a614baa5664e40eaf80).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2321/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21066
  
**[Test build #89346 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89346/testReport)**
 for PR 21066 at commit 
[`9d02ae7`](https://github.com/apache/spark/commit/9d02ae731e0fe314da312a614baa5664e40eaf80).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

2018-04-13 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/21066
  
RAT test was on a 0-byte .keep file in `src/test/scala` as the maven 
plugging adding a profile-specific test source path needs an original one.

easiest fix is just to add a real scala file in the source tree, with an 
ASF comment. I don't want to add explicit instantiation tests (e.g new 
S3AFileSystem()), because of some CP conflict between S3AFS on Hadoop 2.8 and 
spark's own CP: risk of failing on some test setups. It's a legit failure, 
but...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][Wip] Add commit protocol binding to...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2320/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][Wip] Add commit protocol binding to...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][Wip] Add commit protocol binding to...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21066
  
**[Test build #89343 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89343/testReport)**
 for PR 21066 at commit 
[`3da1f3f`](https://github.com/apache/spark/commit/3da1f3faa6601d38deb259203f2f48b17293f51d).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BindingParquetOutputCommitter(`
  * `class PathOutputCommitProtocol(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][Wip] Add commit protocol binding to...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89343/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][Wip] Add commit protocol binding to...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21066
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21066: [SPARK-23977][CLOUD][Wip] Add commit protocol binding to...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21066
  
**[Test build #89343 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89343/testReport)**
 for PR 21066 at commit 
[`3da1f3f`](https://github.com/apache/spark/commit/3da1f3faa6601d38deb259203f2f48b17293f51d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org