[jira] [Commented] (HDFS-13660) DistCp job fails when new data is appended in the file while the distCp copy job is running

2019-09-24 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936657#comment-16936657
 ] 

Hudson commented on HDFS-13660:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17367 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17367/])
HDFS-13660. DistCp job fails when new data is appended in the file while 
(stevel: rev 51c64b357d4bd1a0038e61df3d4b8ea0a3ad7449)
* (edit) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestDistCpUtils.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
* (edit) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyCommitter.java
* (edit) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
* (edit) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestRetriableFileCopyCommand.java
* (edit) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestDistCpUtilsWithCombineMode.java


> DistCp job fails when new data is appended in the file while the distCp copy 
> job is running
> ---
>
> Key: HDFS-13660
> URL: https://issues.apache.org/jira/browse/HDFS-13660
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Critical
> Fix For: 3.3.0
>
> Attachments: distcp_failure_when_file_append.log
>
>
> Steps to reproduce: 
> Suppose distcp MR job is copying the file /tmp/web_returns_merged/data-m-002 
> and 
> we append some more data to this file using command 
> hadoop fs -appendToFile xaa  /tmp/web_returns_merged/data-m-002
> the job fails with exception 
>  Mismatch in length of 
> source:hdfs://mycluster0/tmp/web_returns_merged/data-m-002 and target.
> Attached the logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13660) DistCp job fails when new data is appended in the file while the distCp copy job is running

2019-09-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926519#comment-16926519
 ] 

Hadoop QA commented on HDFS-13660:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-13660 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13660 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27830/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> DistCp job fails when new data is appended in the file while the distCp copy 
> job is running
> ---
>
> Key: HDFS-13660
> URL: https://issues.apache.org/jira/browse/HDFS-13660
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Critical
> Attachments: distcp_failure_when_file_append.log
>
>
> Steps to reproduce: 
> Suppose distcp MR job is copying the file /tmp/web_returns_merged/data-m-002 
> and 
> we append some more data to this file using command 
> hadoop fs -appendToFile xaa  /tmp/web_returns_merged/data-m-002
> the job fails with exception 
>  Mismatch in length of 
> source:hdfs://mycluster0/tmp/web_returns_merged/data-m-002 and target.
> Attached the logs.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13660) DistCp job fails when new data is appended in the file while the distCp copy job is running

2018-06-08 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506347#comment-16506347
 ] 

Steve Loughran commented on HDFS-13660:
---

interesting. But at least it failed...a bigger risk would be if the file was 
changed to a new file of the same size...if the read crossed a block boundary, 
you could end up with a mix of the old and new data. You'd be hard pressed to 
safely identify the problem, other than by comparing the source checksum before 
the upload began with the source checksum after it had finished

# I think the first step here would be to document what you must not do while 
an upload is in progress: append/replace files
# longer term: if, after an upload, identify when the source has changed, warn 
and maybe repeat the upload. That'd be with a checksum on HDFS; modified 
timestamp elsewhere


> DistCp job fails when new data is appended in the file while the distCp copy 
> job is running
> ---
>
> Key: HDFS-13660
> URL: https://issues.apache.org/jira/browse/HDFS-13660
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Critical
> Attachments: distcp_failure_when_file_append.log
>
>
> Steps to reproduce: 
> Suppose distcp MR job is copying the file /tmp/web_returns_merged/data-m-002 
> and 
> we append some more data to this file using command 
> hadoop fs -appendToFile xaa  /tmp/web_returns_merged/data-m-002
> the job fails with exception 
>  Mismatch in length of 
> source:hdfs://mycluster0/tmp/web_returns_merged/data-m-002 and target.
> Attached the logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org