[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688017#comment-15688017 ] Hudson commented on HADOOP-13655: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10875 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10875/]) HADOOP-13655. document object store use with fs shell and distcp. (liuml07: rev beb70fed4f15cd4afe8ea23e6068a8344d3557b1) * (edit) hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md * (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm * (edit) hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 2.7.4, 3.0.0-alpha2 > > Attachments: HADOOP-13655.000.patch > > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687979#comment-15687979 ] Mingliang Liu commented on HADOOP-13655: Per offline discussion with Steve, the patch should go to {{trunk}} as well. The changes are pretty the same across branches. > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 2.7.4, 3.0.0-alpha2 > > Attachments: HADOOP-13655.000.patch > > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687920#comment-15687920 ] ASF GitHub Bot commented on HADOOP-13655: - Github user asfgit closed the pull request at: https://github.com/apache/hadoop/pull/131 > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-13655.000.patch > > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685466#comment-15685466 ] Hadoop QA commented on HADOOP-13655: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 49s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 1s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} branch-2 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 10 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 50s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:b59b8b7 | | JIRA Issue | HADOOP-13655 | | GITHUB PR | https://github.com/apache/hadoop/pull/131 | | Optional Tests | asflicense mvnsite | | uname | Linux 8b08858dd925 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 4b289d5 | | whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/3/artifact/patchprocess/whitespace-eol.txt | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp U: . | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/3/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-13655.000.patch > > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683631#comment-15683631 ] ASF GitHub Bot commented on HADOOP-13655: - Github user steveloughran commented on a diff in the pull request: https://github.com/apache/hadoop/pull/131#discussion_r88898982 --- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm --- @@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources The SSL configuration file must be in the class-path of the DistCp program. +$H3 DistCp and Object Stores + +DistCp works with Object Stores such as Amazon S3, Azure WASB and OpenStack Swift. + +Prequisites + +1. The JAR containing the object store implementation is on the classpath, +along with all of its dependencies. +1. Unless the JAR automatically registers its bundled filesystem clients, +the configuration may need to be modified to state the class which +implements the filesystem schema. All of the ASF's own object store clients +are self-registering. +1. The relevant object store access credentials must be available in the cluster +configuration, or be otherwise available in all cluster hosts. + +DistCp can be used to upload data + +```bash +hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1 +``` + +To download data + +```bash +hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results +``` + +To copy data between object stores + +```bash +hadoop distcp s3a://bucket/generated/results \ + wasb://upda...@example.blob.core.windows.net +``` + +And do copy data within an object store + +```bash +hadoop distcp wasb://upda...@example.blob.core.windows.net/current \ + wasb://upda...@example.blob.core.windows.net/old +``` + +And to use `-update` to only copy changed files. + +```bash +hadoop distcp -update -numListstatusThreads 20 \ + swift://history.cluster1/2016 \ + hdfs://nn1:8020/history/2016 +``` + +Because object stores are slow to list files, consider setting the `-numListstatusThreads` option when performing a `-update` operation +on a large directory tree (the limit is 40 threads). + +When `DistCp -update` is used with objec stores, +generally only the modification time and length of the individual files are compared, +not any checksums. The fact that most object stores do have valid timestamps +for directories is irrelevant; only the file timestamps are compared. +However, it is important to have the clock of the client computers close +to that of the infrastructure, so that timestamps are consistent between +the client/HDFS cluster and that of the object store. Otherwise, changed files may be +missed/copied too often. + +**Notes** + +* The `-atomic` option causes a rename of the temporary data, so significantly +increases the time to commit work at the end of the operation. Furthermore, +as Object Stores other than (optionally) `wasb://` do not offer atomic renames of directories +the `-atomic` operation doesn't actually deliver what is promised. *Avoid*. + +* The `-append` option is not supported. + +* The `-diff` option is not supported --- End diff -- ok > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671870#comment-15671870 ] ASF GitHub Bot commented on HADOOP-13655: - Github user liuml07 commented on a diff in the pull request: https://github.com/apache/hadoop/pull/131#discussion_r88332309 --- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm --- @@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources The SSL configuration file must be in the class-path of the DistCp program. +$H3 DistCp and Object Stores + +DistCp works with Object Stores such as Amazon S3, Azure WASB and OpenStack Swift. + +Prequisites + +1. The JAR containing the object store implementation is on the classpath, +along with all of its dependencies. +1. Unless the JAR automatically registers its bundled filesystem clients, +the configuration may need to be modified to state the class which +implements the filesystem schema. All of the ASF's own object store clients +are self-registering. +1. The relevant object store access credentials must be available in the cluster +configuration, or be otherwise available in all cluster hosts. + +DistCp can be used to upload data + +```bash +hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1 +``` + +To download data + +```bash +hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results +``` + +To copy data between object stores + +```bash +hadoop distcp s3a://bucket/generated/results \ + wasb://upda...@example.blob.core.windows.net +``` + +And do copy data within an object store + +```bash +hadoop distcp wasb://upda...@example.blob.core.windows.net/current \ + wasb://upda...@example.blob.core.windows.net/old +``` + +And to use `-update` to only copy changed files. + +```bash +hadoop distcp -update -numListstatusThreads 20 \ + swift://history.cluster1/2016 \ + hdfs://nn1:8020/history/2016 +``` + +Because object stores are slow to list files, consider setting the `-numListstatusThreads` option when performing a `-update` operation +on a large directory tree (the limit is 40 threads). + +When `DistCp -update` is used with objec stores, --- End diff -- objec -> object > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671867#comment-15671867 ] ASF GitHub Bot commented on HADOOP-13655: - Github user liuml07 commented on a diff in the pull request: https://github.com/apache/hadoop/pull/131#discussion_r88343643 --- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm --- @@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources The SSL configuration file must be in the class-path of the DistCp program. +$H3 DistCp and Object Stores + +DistCp works with Object Stores such as Amazon S3, Azure WASB and OpenStack Swift. + +Prequisites + +1. The JAR containing the object store implementation is on the classpath, +along with all of its dependencies. +1. Unless the JAR automatically registers its bundled filesystem clients, +the configuration may need to be modified to state the class which +implements the filesystem schema. All of the ASF's own object store clients +are self-registering. +1. The relevant object store access credentials must be available in the cluster +configuration, or be otherwise available in all cluster hosts. + +DistCp can be used to upload data + +```bash +hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1 +``` + +To download data + +```bash +hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results +``` + +To copy data between object stores + +```bash +hadoop distcp s3a://bucket/generated/results \ + wasb://upda...@example.blob.core.windows.net +``` + +And do copy data within an object store + +```bash +hadoop distcp wasb://upda...@example.blob.core.windows.net/current \ + wasb://upda...@example.blob.core.windows.net/old +``` + +And to use `-update` to only copy changed files. + +```bash +hadoop distcp -update -numListstatusThreads 20 \ + swift://history.cluster1/2016 \ + hdfs://nn1:8020/history/2016 +``` + +Because object stores are slow to list files, consider setting the `-numListstatusThreads` option when performing a `-update` operation +on a large directory tree (the limit is 40 threads). + +When `DistCp -update` is used with objec stores, +generally only the modification time and length of the individual files are compared, +not any checksums. The fact that most object stores do have valid timestamps +for directories is irrelevant; only the file timestamps are compared. +However, it is important to have the clock of the client computers close +to that of the infrastructure, so that timestamps are consistent between +the client/HDFS cluster and that of the object store. Otherwise, changed files may be +missed/copied too often. + +**Notes** + +* The `-atomic` option causes a rename of the temporary data, so significantly +increases the time to commit work at the end of the operation. Furthermore, +as Object Stores other than (optionally) `wasb://` do not offer atomic renames of directories +the `-atomic` operation doesn't actually deliver what is promised. *Avoid*. + +* The `-append` option is not supported. + +* The `-diff` option is not supported + +* CRC checking will not be performed, irrespective of the value of the `-skipCrc` +flag. + +* All `-p` options, including those to preserve permissions, user and group information, attributes +checksums and replication are generally ignored. The `wasb://` connector will +preserve the information, but not enforce the permissions. + +* Some object store connectors offer an option for in-memory buffering of +output —for example the S3A connector. Using such option while copying +large files may trigger some form of out of memory event, +be it a heap overflow or a YARN container termination. +This is particularly common if the network bandwidth +between the cluster and the object store is limited (such as when working +with remote object stores). It is best to disable/avoid such options and +rely on disk buffering. + +* Copy operations within a single object store still take place in the Hadoop cluster +—even when the object store implements a more efficient COPY operation internally + +That is, an operation such as --- End diff -- The indention is unnecessary? > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671868#comment-15671868 ] ASF GitHub Bot commented on HADOOP-13655: - Github user liuml07 commented on a diff in the pull request: https://github.com/apache/hadoop/pull/131#discussion_r88342205 --- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm --- @@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources The SSL configuration file must be in the class-path of the DistCp program. +$H3 DistCp and Object Stores + +DistCp works with Object Stores such as Amazon S3, Azure WASB and OpenStack Swift. + +Prequisites + +1. The JAR containing the object store implementation is on the classpath, +along with all of its dependencies. +1. Unless the JAR automatically registers its bundled filesystem clients, +the configuration may need to be modified to state the class which +implements the filesystem schema. All of the ASF's own object store clients +are self-registering. +1. The relevant object store access credentials must be available in the cluster +configuration, or be otherwise available in all cluster hosts. + +DistCp can be used to upload data + +```bash +hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1 +``` + +To download data + +```bash +hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results +``` + +To copy data between object stores + +```bash +hadoop distcp s3a://bucket/generated/results \ + wasb://upda...@example.blob.core.windows.net +``` + +And do copy data within an object store + +```bash +hadoop distcp wasb://upda...@example.blob.core.windows.net/current \ + wasb://upda...@example.blob.core.windows.net/old +``` + +And to use `-update` to only copy changed files. + +```bash +hadoop distcp -update -numListstatusThreads 20 \ + swift://history.cluster1/2016 \ + hdfs://nn1:8020/history/2016 +``` + +Because object stores are slow to list files, consider setting the `-numListstatusThreads` option when performing a `-update` operation +on a large directory tree (the limit is 40 threads). + +When `DistCp -update` is used with objec stores, +generally only the modification time and length of the individual files are compared, +not any checksums. The fact that most object stores do have valid timestamps +for directories is irrelevant; only the file timestamps are compared. +However, it is important to have the clock of the client computers close +to that of the infrastructure, so that timestamps are consistent between +the client/HDFS cluster and that of the object store. Otherwise, changed files may be +missed/copied too often. + +**Notes** + +* The `-atomic` option causes a rename of the temporary data, so significantly +increases the time to commit work at the end of the operation. Furthermore, +as Object Stores other than (optionally) `wasb://` do not offer atomic renames of directories +the `-atomic` operation doesn't actually deliver what is promised. *Avoid*. + +* The `-append` option is not supported. + +* The `-diff` option is not supported --- End diff -- The `-diff/-rdiff` option is not supported Yes there is an `rdiff` options that is just added. > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671869#comment-15671869 ] ASF GitHub Bot commented on HADOOP-13655: - Github user liuml07 commented on a diff in the pull request: https://github.com/apache/hadoop/pull/131#discussion_r88338833 --- Diff: hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md --- @@ -729,3 +757,280 @@ usage Usage: `hadoop fs -usage command` Return the help for an individual command. + + +Working with Object Storage --- End diff -- `` is accidently here I guess? > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602414#comment-15602414 ] Hadoop QA commented on HADOOP-13655: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 54s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s{color} | {color:green} branch-2 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 57 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 11m 35s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:b59b8b7 | | JIRA Issue | HADOOP-13655 | | GITHUB PR | https://github.com/apache/hadoop/pull/131 | | Optional Tests | asflicense mvnsite | | uname | Linux 997497522d2b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 086577c | | whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/10871/artifact/patchprocess/whitespace-eol.txt | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp U: . | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/10871/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15529527#comment-15529527 ] ASF GitHub Bot commented on HADOOP-13655: - Github user steveloughran commented on a diff in the pull request: https://github.com/apache/hadoop/pull/131#discussion_r80909422 --- Diff: hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md --- @@ -315,7 +324,11 @@ Returns 0 on success and -1 on error. Options: -The -f option will overwrite the destination if it already exists. +* `-p` : Preserves access and modification times, ownership and the permissions. +(assuming the permissions can be propagated across filesystems) +* `-f` : Overwrites the destination if it already exists. +* `-ignorecrc` : Skip CRC checks on the file(s) downloaded. +* `crc`: write CRC checksums for the files downloaded. --- End diff -- fixed > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528321#comment-15528321 ] Yuanbo Liu commented on HADOOP-13655: - [~ste...@apache.org] I've reviewed your pull request in GitHub. Great work! Since I don't have much knowledge about object store, I just find some trivial mistake there. I would be glad to test those commands if I had object store environment. Thank again for your work, well done! > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528314#comment-15528314 ] ASF GitHub Bot commented on HADOOP-13655: - Github user yuanboliu commented on a diff in the pull request: https://github.com/apache/hadoop/pull/131#discussion_r80839392 --- Diff: hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md --- @@ -729,3 +757,278 @@ usage Usage: `hadoop fs -usage command` Return the help for an individual command. + + +Working with Object Storage + + +The Hadoop FileSystem shell works with Object Stores such as Amazon S3, +Azure WASB and OpenStack Swift. + + + +```bash +# Create a directory +hadoop fs -mkdir s3a://bucket/datasets/ + +# Upload a file from the cluster filesystem +hadoop fs -put /datasets/example.orc s3a://bucket/datasets/ + +# touch a file +hadoop fs -touchz wasb://yourcontai...@youraccount.blob.core.windows.net/touched +``` + +Unlike a normal filesystem, renaming files and directories in an object store +usually takes time proportional to the size of the objects being manipulated. +As many of the filesystem shell operations +use renaming as the final stage in operations, skipping that stage +can avoid long delays. + +In particular, the `put` and `copyFromLocal` commands should +both have the `-d` options set for a direct upload. + + +```bash +# Upload a file from the cluster filesystem +hadoop fs -put -d /datasets/example.orc s3a://bucket/datasets/ + +# Upload a file from the local filesystem +hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc s3a://bucket/datasets/ --- End diff -- hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc s3a://bucket/datasets/ The symbol "~" is redundant, right? > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528312#comment-15528312 ] ASF GitHub Bot commented on HADOOP-13655: - Github user yuanboliu commented on a diff in the pull request: https://github.com/apache/hadoop/pull/131#discussion_r80839511 --- Diff: hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md --- @@ -729,3 +757,278 @@ usage Usage: `hadoop fs -usage command` Return the help for an individual command. + + +Working with Object Storage + + +The Hadoop FileSystem shell works with Object Stores such as Amazon S3, +Azure WASB and OpenStack Swift. + + + +```bash +# Create a directory +hadoop fs -mkdir s3a://bucket/datasets/ + +# Upload a file from the cluster filesystem +hadoop fs -put /datasets/example.orc s3a://bucket/datasets/ + +# touch a file +hadoop fs -touchz wasb://yourcontai...@youraccount.blob.core.windows.net/touched +``` + +Unlike a normal filesystem, renaming files and directories in an object store +usually takes time proportional to the size of the objects being manipulated. +As many of the filesystem shell operations +use renaming as the final stage in operations, skipping that stage +can avoid long delays. + +In particular, the `put` and `copyFromLocal` commands should +both have the `-d` options set for a direct upload. + + +```bash +# Upload a file from the cluster filesystem +hadoop fs -put -d /datasets/example.orc s3a://bucket/datasets/ + +# Upload a file from the local filesystem +hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc s3a://bucket/datasets/ + +# create a file from stdin +echo "hello" | hadoop fs -put -d -f - wasb://yourcontai...@youraccount.blob.core.windows.net/hello.txt --- End diff -- `hadoop fs -put -d -f - wasb:` should be `hadoop fs -put -d -f wasb:` > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528313#comment-15528313 ] ASF GitHub Bot commented on HADOOP-13655: - Github user yuanboliu commented on a diff in the pull request: https://github.com/apache/hadoop/pull/131#discussion_r80836707 --- Diff: hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md --- @@ -315,7 +324,11 @@ Returns 0 on success and -1 on error. Options: -The -f option will overwrite the destination if it already exists. +* `-p` : Preserves access and modification times, ownership and the permissions. +(assuming the permissions can be propagated across filesystems) +* `-f` : Overwrites the destination if it already exists. +* `-ignorecrc` : Skip CRC checks on the file(s) downloaded. +* `crc`: write CRC checksums for the files downloaded. --- End diff -- `crc` should be `-crc` > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp
[ https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522961#comment-15522961 ] ASF GitHub Bot commented on HADOOP-13655: - GitHub user steveloughran opened a pull request: https://github.com/apache/hadoop/pull/131 HADOOP-13655 Patch of filesystem shell & distcp docs to cover object stores. Also updated some references in filesystem/index.md which were out of date You can merge this pull request into a Git repository by running: $ git pull https://github.com/steveloughran/hadoop s3/HADOOP-13655-shell-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #131 commit 0a76336a0515e136474cc62b7e1b97aa175f7d10 Author: Steve LoughranDate: 2016-09-26T12:44:59Z HADOOP-13655 patch 001 of docs > document object store use with fs shell and distcp > -- > > Key: HADOOP-13655 > URL: https://issues.apache.org/jira/browse/HADOOP-13655 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs, fs/s3 >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > There's no specific docs for working with object stores from the {{hadoop > fs}} shell or in distcp; people either suffer from this (performance, > billing), or learn through trial and error what to do. > Add a section in both fs shell and distcp docs covering use with object > stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org