[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-04-04 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HADOOP-15273:
---
Fix Version/s: (was: 3.0.2)
   3.0.3

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 3.1.0, 3.0.3
>
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Fix Version/s: 3.0.2

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

committed to branch-3.1+; reran copy mapper test first.
 
Test-wise, this shows we need some more realistic store distcp tests, 
specifically: need to set HDFS <--> store rather than just local <--> store. 
And also: intra-store, inter-store. Which will make it a fairly complex piece 
of work.

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Patch Available  (was: Open)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Open  (was: Patch Available)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Target Version/s: 3.1.0
  Status: Patch Available  (was: Open)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Attachment: HADOOP-15273-003.patch

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Priority: Critical  (was: Blocker)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Open  (was: Patch Available)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Patch Available  (was: Open)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Attachment: HADOOP-15273-002.patch

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Open  (was: Patch Available)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Priority: Blocker  (was: Critical)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Patch Available  (was: Open)

Patch 001

* allows -skipcrccheck everywhere
* when the filesystem schemas are different not the hdfs ones (hdfs, webhdfs, 
swebhdfs) then a filesystem message is printed instead of one about block size
* error message adds \n formatting
* and the correct name of the option to disable the checks

Tests: not easily. Maybe after HADOOP-15209 is in I could do it...we'd need 
something in hadoop-aws with a minihdfs cluster. This is not an easy 
undertaking.

I have manually tested it & verified that yes, the skipcrc goes down.

Even with this patch, I'm wondering whether its best to revert the s3a etag 
feature until we have distcp better able to cope

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Attachment: HADOOP-15273-001.patch

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Description: 
When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
between src and dest store types (e.g hdfs to s3), then the error message will 
talk about blocksize, even when its the underlying checksum protocol itself 
which is the cause for failure

bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
(NOTE: By skipping checksums, one runs the risk of masking data-corruption 
during file-transfer.)

update:  the CRC check takes always place on a distcp upload before the file is 
renamed into place. *and you can't disable it then*

  was:
When using distcp without {{-skipCRC}} . If there's a checksum mismatch between 
src and dest store types (e.g hdfs to s3), then the error message will talk 
about blocksize, even when its the underlying checksum protocol itself which is 
the cause for failure

bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
(NOTE: By skipping checksums, one runs the risk of masking data-corruption 
during file-transfer.)

IF the checksum types are fundamentally different, the error message should say 
so


> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Critical
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Priority: Critical  (was: Minor)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Critical
>
> When using distcp without {{-skipCRC}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> IF the checksum types are fundamentally different, the error message should 
> say so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Summary: distcp can't handle remote stores with different checksum 
algorithms  (was: distcp to downgrade on checksum algorithm mismatch to "files 
unchanged")

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Minor
>
> When using distcp without {{-skipCRC}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> IF the checksum types are fundamentally different, the error message should 
> say so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org