[jira] [Commented] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3

2016-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755353#comment-15755353
 ] 

Hadoop QA commented on HADOOP-13786:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
40s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
56s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
31s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
41s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 10m 25s{color} 
| {color:red} root generated 2 new + 716 unchanged - 0 fixed = 718 total (was 
716) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 41s{color} | {color:orange} root: The patch generated 48 new + 66 unchanged 
- 1 fixed = 114 total (was 67) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 43 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
25s{color} | {color:red} hadoop-aws in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
56s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
44s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13786 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12843630/HADOOP-13786-HADOOP-13345-001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 5d61d5412ac4 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HADOOP-13345 / c7885de |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| javac | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11292/artifact/patchprocess/diff-compile-javac-root.txt
 |
| checkstyle | 

[jira] [Commented] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3

2016-12-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755321#comment-15755321
 ] 

Steve Loughran commented on HADOOP-13786:
-

im offline for a bit, and this patch doesn't do the new work, it's a precursor. 

> add output committer which uses s3guard for consistent commits to S3
> 
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13786-HADOOP-13345-001.patch
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3

2016-12-16 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755218#comment-15755218
 ] 

Mingliang Liu commented on HADOOP-13786:


Great work, [~ste...@apache.org]! I'll (learn and) review this year. Thanks,

> add output committer which uses s3guard for consistent commits to S3
> 
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13786-HADOOP-13345-001.patch
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3

2016-12-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755176#comment-15755176
 ] 

Steve Loughran commented on HADOOP-13786:
-

no worries; I had to write everything down to understand it more, look at the 
issues and estimate costs. I need to really define what the different 
committers do too, to make sure that I understand what they are doing *and 
where*.

> add output committer which uses s3guard for consistent commits to S3
> 
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13786-HADOOP-13345-001.patch
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3

2016-12-16 Thread Thomas Demoor (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755099#comment-15755099
 ] 

Thomas Demoor commented on HADOOP-13786:


Had my write-up ready but you beat me too actually submitting 
[~ste...@apache.org]

Filed HADOOP-13912 so we can track this separately, but I think it should go on 
top on what you have already done in the current ticket.



> add output committer which uses s3guard for consistent commits to S3
> 
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3

2016-12-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755057#comment-15755057
 ] 

Steve Loughran commented on HADOOP-13786:
-

A few of us (Thomas, Pieter, Mingliang, Aaron and Sean) had a quick conf call 
earlier this week, where Thomas and Pieter outlined their proposed algorithm 
for implementing zero-rename commits to any consistent S3 endpoint.

I've written up [my interpretation of the 
algorithm|https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delayed-put-commit.md]
 for review, looking though the Hadoop and Spark commit code to see what 
appears to be going on, though I do need to actually document the various 
algorithms better. 

Comments welcome, especially those containing proofs of correctness

> add output committer which uses s3guard for consistent commits to S3
> 
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3

2016-12-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755037#comment-15755037
 ] 

Steve Loughran commented on HADOOP-13786:
-

thanks. I Don't see an easy fix, but I like to collect all data about failing 
commits.



> add output committer which uses s3guard for consistent commits to S3
> 
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3

2016-12-06 Thread Adrian Bridgett (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726711#comment-15726711
 ] 

Adrian Bridgett commented on HADOOP-13786:
--

Stack trace as requested (spark 2.0.2, hadoop 2.7.3):
{noformat}
 16/12/05 10:33:15 WARN scheduler.TaskSetManager: Lost task 170.0 in stage 10.0 
(TID 3390, 10.1.201.20): org.apache.spark.SparkException: Task failed while 
writing rows
 at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
 at org.apache.spark.scheduler.Task.run(Task.scala:86)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.RuntimeException: Failed to commit task
 at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:275)
 at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:257)
 at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
 at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
 at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1348)
 at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258)
 ... 8 more
 Caused by: java.io.IOException: Failed to rename 
S3AFileStatus{path=s3a://foo-scratch/adrian/dgm/nycextra/final/3dp/_temporary/0/_temporary/attempt_201612051033_0010_m_000170_0/part-r-00170-74752ee1-86f6-44e7-a5df-1c7b1e6fde47.csv.gz;
 isDirectory=false; length=32644; replication=1; blocksize=33554432; 
modification_time=1480933995000; access_time=0; owner=; group=; 
permission=rw-rw-rw-; isSymlink=false} to 
s3a://foo-scratch/adrian/dgm/nycextra/final/3dp/part-r-00170-74752ee1-86f6-44e7-a5df-1c7b1e6fde47.csv.gz
 at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:371)
 at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:384)
 at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:486)
 at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:449)
 at 
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
 at 
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
 at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:211)
 at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:270)
 ... 13 more
{noformat}

> add output committer which uses s3guard for consistent commits to S3
> 
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13786) add output committer which uses s3guard for consistent commits to S3

2016-12-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15722345#comment-15722345
 ] 

Steve Loughran commented on HADOOP-13786:
-

SPARK-18512 highlights a consistency problem surfacing during the mergePaths 
treewalk


> add output committer which uses s3guard for consistent commits to S3
> 
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org