[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688017#comment-15688017
 ] 

Hudson commented on HADOOP-13655:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10875 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10875/])
HADOOP-13655. document object store use with fs shell and distcp. (liuml07: rev 
beb70fed4f15cd4afe8ea23e6068a8344d3557b1)
* (edit) 
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md
* (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
* (edit) 
hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HADOOP-13655.000.patch
>
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-11-22 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687979#comment-15687979
 ] 

Mingliang Liu commented on HADOOP-13655:


Per offline discussion with Steve, the patch should go to {{trunk}} as well. 
The changes are pretty the same across branches.

> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 2.7.4, 3.0.0-alpha2
>
> Attachments: HADOOP-13655.000.patch
>
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687920#comment-15687920
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

Github user asfgit closed the pull request at:

https://github.com/apache/hadoop/pull/131


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13655.000.patch
>
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685466#comment-15685466
 ] 

Hadoop QA commented on HADOOP-13655:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 
49s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 1s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 10 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:b59b8b7 |
| JIRA Issue | HADOOP-13655 |
| GITHUB PR | https://github.com/apache/hadoop/pull/131 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 8b08858dd925 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2 / 4b289d5 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3/artifact/patchprocess/whitespace-eol.txt
 |
| modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp 
U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13655.000.patch
>
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683631#comment-15683631
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/131#discussion_r88898982
  
--- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm ---
@@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources
 
   The SSL configuration file must be in the class-path of the DistCp 
program.
 
+$H3 DistCp and Object Stores
+
+DistCp works with Object Stores such as Amazon S3, Azure WASB and 
OpenStack Swift.
+
+Prequisites
+
+1. The JAR containing the object store implementation is on the classpath,
+along with all of its dependencies.
+1. Unless the JAR automatically registers its bundled filesystem clients,
+the configuration may need to be modified to state the class which
+implements the filesystem schema. All of the ASF's own object store clients
+are self-registering.
+1. The relevant object store access credentials must be available in the 
cluster
+configuration, or be otherwise available in all cluster hosts.
+
+DistCp can be used to upload data
+
+```bash
+hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1
+```
+
+To download data
+
+```bash
+hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results
+```
+
+To copy data between object stores
+
+```bash
+hadoop distcp s3a://bucket/generated/results \
+  wasb://upda...@example.blob.core.windows.net
+```
+
+And do copy data within an object store
+
+```bash
+hadoop distcp wasb://upda...@example.blob.core.windows.net/current \
+  wasb://upda...@example.blob.core.windows.net/old
+```
+
+And to use `-update` to only copy changed files.
+
+```bash
+hadoop distcp -update -numListstatusThreads 20  \
+  swift://history.cluster1/2016 \
+  hdfs://nn1:8020/history/2016
+```
+
+Because object stores are slow to list files, consider setting the 
`-numListstatusThreads` option when performing a `-update` operation
+on a large directory tree (the limit is 40 threads).
+
+When `DistCp -update` is used with objec stores,
+generally only the modification time and length of the individual files 
are compared,
+not any checksums. The fact that most object stores do have valid 
timestamps
+for directories is irrelevant; only the file timestamps are compared.
+However, it is important to have the clock of the client computers close
+to that of the infrastructure, so that timestamps are consistent between
+the client/HDFS cluster and that of the object store. Otherwise, changed 
files may be
+missed/copied too often.
+
+**Notes**
+
+* The `-atomic` option causes a rename of the temporary data, so 
significantly
+increases the time to commit work at the end of the operation. Furthermore,
+as Object Stores other than (optionally) `wasb://` do not offer atomic 
renames of directories
+the `-atomic` operation doesn't actually deliver what is promised. *Avoid*.
+
+* The `-append` option is not supported.
+
+* The `-diff` option is not supported
--- End diff --

ok


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671870#comment-15671870
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

Github user liuml07 commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/131#discussion_r88332309
  
--- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm ---
@@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources
 
   The SSL configuration file must be in the class-path of the DistCp 
program.
 
+$H3 DistCp and Object Stores
+
+DistCp works with Object Stores such as Amazon S3, Azure WASB and 
OpenStack Swift.
+
+Prequisites
+
+1. The JAR containing the object store implementation is on the classpath,
+along with all of its dependencies.
+1. Unless the JAR automatically registers its bundled filesystem clients,
+the configuration may need to be modified to state the class which
+implements the filesystem schema. All of the ASF's own object store clients
+are self-registering.
+1. The relevant object store access credentials must be available in the 
cluster
+configuration, or be otherwise available in all cluster hosts.
+
+DistCp can be used to upload data
+
+```bash
+hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1
+```
+
+To download data
+
+```bash
+hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results
+```
+
+To copy data between object stores
+
+```bash
+hadoop distcp s3a://bucket/generated/results \
+  wasb://upda...@example.blob.core.windows.net
+```
+
+And do copy data within an object store
+
+```bash
+hadoop distcp wasb://upda...@example.blob.core.windows.net/current \
+  wasb://upda...@example.blob.core.windows.net/old
+```
+
+And to use `-update` to only copy changed files.
+
+```bash
+hadoop distcp -update -numListstatusThreads 20  \
+  swift://history.cluster1/2016 \
+  hdfs://nn1:8020/history/2016
+```
+
+Because object stores are slow to list files, consider setting the 
`-numListstatusThreads` option when performing a `-update` operation
+on a large directory tree (the limit is 40 threads).
+
+When `DistCp -update` is used with objec stores,
--- End diff --

objec -> object


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671867#comment-15671867
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

Github user liuml07 commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/131#discussion_r88343643
  
--- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm ---
@@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources
 
   The SSL configuration file must be in the class-path of the DistCp 
program.
 
+$H3 DistCp and Object Stores
+
+DistCp works with Object Stores such as Amazon S3, Azure WASB and 
OpenStack Swift.
+
+Prequisites
+
+1. The JAR containing the object store implementation is on the classpath,
+along with all of its dependencies.
+1. Unless the JAR automatically registers its bundled filesystem clients,
+the configuration may need to be modified to state the class which
+implements the filesystem schema. All of the ASF's own object store clients
+are self-registering.
+1. The relevant object store access credentials must be available in the 
cluster
+configuration, or be otherwise available in all cluster hosts.
+
+DistCp can be used to upload data
+
+```bash
+hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1
+```
+
+To download data
+
+```bash
+hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results
+```
+
+To copy data between object stores
+
+```bash
+hadoop distcp s3a://bucket/generated/results \
+  wasb://upda...@example.blob.core.windows.net
+```
+
+And do copy data within an object store
+
+```bash
+hadoop distcp wasb://upda...@example.blob.core.windows.net/current \
+  wasb://upda...@example.blob.core.windows.net/old
+```
+
+And to use `-update` to only copy changed files.
+
+```bash
+hadoop distcp -update -numListstatusThreads 20  \
+  swift://history.cluster1/2016 \
+  hdfs://nn1:8020/history/2016
+```
+
+Because object stores are slow to list files, consider setting the 
`-numListstatusThreads` option when performing a `-update` operation
+on a large directory tree (the limit is 40 threads).
+
+When `DistCp -update` is used with objec stores,
+generally only the modification time and length of the individual files 
are compared,
+not any checksums. The fact that most object stores do have valid 
timestamps
+for directories is irrelevant; only the file timestamps are compared.
+However, it is important to have the clock of the client computers close
+to that of the infrastructure, so that timestamps are consistent between
+the client/HDFS cluster and that of the object store. Otherwise, changed 
files may be
+missed/copied too often.
+
+**Notes**
+
+* The `-atomic` option causes a rename of the temporary data, so 
significantly
+increases the time to commit work at the end of the operation. Furthermore,
+as Object Stores other than (optionally) `wasb://` do not offer atomic 
renames of directories
+the `-atomic` operation doesn't actually deliver what is promised. *Avoid*.
+
+* The `-append` option is not supported.
+
+* The `-diff` option is not supported
+ 
+* CRC checking will not be performed, irrespective of the value of the 
`-skipCrc`
+flag.
+
+* All `-p` options, including those to preserve permissions, user and 
group information, attributes
+checksums and replication are generally ignored. The `wasb://` connector 
will
+preserve the information, but not enforce the permissions.
+
+* Some object store connectors offer an option for in-memory buffering of
+output —for example the S3A connector. Using such option while copying
+large files may trigger some form of out of memory event,
+be it a heap overflow or a YARN container termination.
+This is particularly common if the network bandwidth
+between the cluster and the object store is limited (such as when working
+with remote object stores). It is best to disable/avoid such options and
+rely on disk buffering.
+
+* Copy operations within a single object store still take place in the 
Hadoop cluster
+—even when the object store implements a more efficient COPY operation 
internally
+
+That is, an operation such as
--- End diff --

The indention is unnecessary?


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects 

[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671868#comment-15671868
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

Github user liuml07 commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/131#discussion_r88342205
  
--- Diff: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm ---
@@ -470,6 +470,105 @@ $H3 SSL Configurations for HSFTP sources
 
   The SSL configuration file must be in the class-path of the DistCp 
program.
 
+$H3 DistCp and Object Stores
+
+DistCp works with Object Stores such as Amazon S3, Azure WASB and 
OpenStack Swift.
+
+Prequisites
+
+1. The JAR containing the object store implementation is on the classpath,
+along with all of its dependencies.
+1. Unless the JAR automatically registers its bundled filesystem clients,
+the configuration may need to be modified to state the class which
+implements the filesystem schema. All of the ASF's own object store clients
+are self-registering.
+1. The relevant object store access credentials must be available in the 
cluster
+configuration, or be otherwise available in all cluster hosts.
+
+DistCp can be used to upload data
+
+```bash
+hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1
+```
+
+To download data
+
+```bash
+hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results
+```
+
+To copy data between object stores
+
+```bash
+hadoop distcp s3a://bucket/generated/results \
+  wasb://upda...@example.blob.core.windows.net
+```
+
+And do copy data within an object store
+
+```bash
+hadoop distcp wasb://upda...@example.blob.core.windows.net/current \
+  wasb://upda...@example.blob.core.windows.net/old
+```
+
+And to use `-update` to only copy changed files.
+
+```bash
+hadoop distcp -update -numListstatusThreads 20  \
+  swift://history.cluster1/2016 \
+  hdfs://nn1:8020/history/2016
+```
+
+Because object stores are slow to list files, consider setting the 
`-numListstatusThreads` option when performing a `-update` operation
+on a large directory tree (the limit is 40 threads).
+
+When `DistCp -update` is used with objec stores,
+generally only the modification time and length of the individual files 
are compared,
+not any checksums. The fact that most object stores do have valid 
timestamps
+for directories is irrelevant; only the file timestamps are compared.
+However, it is important to have the clock of the client computers close
+to that of the infrastructure, so that timestamps are consistent between
+the client/HDFS cluster and that of the object store. Otherwise, changed 
files may be
+missed/copied too often.
+
+**Notes**
+
+* The `-atomic` option causes a rename of the temporary data, so 
significantly
+increases the time to commit work at the end of the operation. Furthermore,
+as Object Stores other than (optionally) `wasb://` do not offer atomic 
renames of directories
+the `-atomic` operation doesn't actually deliver what is promised. *Avoid*.
+
+* The `-append` option is not supported.
+
+* The `-diff` option is not supported
--- End diff --

The `-diff/-rdiff` option is not supported

Yes there is an `rdiff` options that is just added.


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671869#comment-15671869
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

Github user liuml07 commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/131#discussion_r88338833
  
--- Diff: 
hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md ---
@@ -729,3 +757,280 @@ usage
 Usage: `hadoop fs -usage command`
 
 Return the help for an individual command.
+
+
+Working with Object Storage
--- End diff --

 `` is accidently here I guess?


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602414#comment-15602414
 ] 

Hadoop QA commented on HADOOP-13655:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
54s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
38s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 57 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:b59b8b7 |
| JIRA Issue | HADOOP-13655 |
| GITHUB PR | https://github.com/apache/hadoop/pull/131 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 997497522d2b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2 / 086577c |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/10871/artifact/patchprocess/whitespace-eol.txt
 |
| modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp 
U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/10871/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-09-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15529527#comment-15529527
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/131#discussion_r80909422
  
--- Diff: 
hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md ---
@@ -315,7 +324,11 @@ Returns 0 on success and -1 on error.
 
 Options:
 
-The -f option will overwrite the destination if it already exists.
+* `-p` : Preserves access and modification times, ownership and the 
permissions.
+(assuming the permissions can be propagated across filesystems)
+* `-f` : Overwrites the destination if it already exists.
+* `-ignorecrc` : Skip CRC checks on the file(s) downloaded.
+* `crc`: write CRC checksums for the files downloaded.
--- End diff --

fixed


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-09-27 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528321#comment-15528321
 ] 

Yuanbo Liu commented on HADOOP-13655:
-

[~ste...@apache.org] I've reviewed your pull request in GitHub. Great work! 
Since I don't have much knowledge about object store, I just find some trivial 
mistake there. I would be glad to test those commands if I had object store 
environment.
Thank again for your work, well done!


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-09-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528314#comment-15528314
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

Github user yuanboliu commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/131#discussion_r80839392
  
--- Diff: 
hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md ---
@@ -729,3 +757,278 @@ usage
 Usage: `hadoop fs -usage command`
 
 Return the help for an individual command.
+
+
+Working with Object Storage
+
+
+The Hadoop FileSystem shell works with Object Stores such as Amazon S3, 
+Azure WASB and OpenStack Swift.
+
+
+
+```bash
+# Create a directory
+hadoop fs -mkdir s3a://bucket/datasets/
+
+# Upload a file from the cluster filesystem
+hadoop fs -put /datasets/example.orc s3a://bucket/datasets/
+
+# touch a file
+hadoop fs -touchz 
wasb://yourcontai...@youraccount.blob.core.windows.net/touched
+```
+
+Unlike a normal filesystem, renaming files and directories in an object 
store
+usually takes time proportional to the size of the objects being 
manipulated.
+As many of the filesystem shell operations
+use renaming as the final stage in operations, skipping that stage
+can avoid long delays.
+ 
+In particular, the `put` and `copyFromLocal` commands should
+both have the `-d` options set for a direct upload.
+
+
+```bash
+# Upload a file from the cluster filesystem
+hadoop fs -put -d /datasets/example.orc s3a://bucket/datasets/
+
+# Upload a file from the local filesystem
+hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc 
s3a://bucket/datasets/
--- End diff --

hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc s3a://bucket/datasets/
The symbol "~" is redundant, right?


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-09-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528312#comment-15528312
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

Github user yuanboliu commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/131#discussion_r80839511
  
--- Diff: 
hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md ---
@@ -729,3 +757,278 @@ usage
 Usage: `hadoop fs -usage command`
 
 Return the help for an individual command.
+
+
+Working with Object Storage
+
+
+The Hadoop FileSystem shell works with Object Stores such as Amazon S3, 
+Azure WASB and OpenStack Swift.
+
+
+
+```bash
+# Create a directory
+hadoop fs -mkdir s3a://bucket/datasets/
+
+# Upload a file from the cluster filesystem
+hadoop fs -put /datasets/example.orc s3a://bucket/datasets/
+
+# touch a file
+hadoop fs -touchz 
wasb://yourcontai...@youraccount.blob.core.windows.net/touched
+```
+
+Unlike a normal filesystem, renaming files and directories in an object 
store
+usually takes time proportional to the size of the objects being 
manipulated.
+As many of the filesystem shell operations
+use renaming as the final stage in operations, skipping that stage
+can avoid long delays.
+ 
+In particular, the `put` and `copyFromLocal` commands should
+both have the `-d` options set for a direct upload.
+
+
+```bash
+# Upload a file from the cluster filesystem
+hadoop fs -put -d /datasets/example.orc s3a://bucket/datasets/
+
+# Upload a file from the local filesystem
+hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc 
s3a://bucket/datasets/
+
+# create a file from stdin
+echo "hello" | hadoop fs -put -d -f - 
wasb://yourcontai...@youraccount.blob.core.windows.net/hello.txt
--- End diff --

`hadoop fs -put -d -f - wasb:` should be `hadoop fs -put -d -f wasb:`


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-09-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528313#comment-15528313
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

Github user yuanboliu commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/131#discussion_r80836707
  
--- Diff: 
hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md ---
@@ -315,7 +324,11 @@ Returns 0 on success and -1 on error.
 
 Options:
 
-The -f option will overwrite the destination if it already exists.
+* `-p` : Preserves access and modification times, ownership and the 
permissions.
+(assuming the permissions can be propagated across filesystems)
+* `-f` : Overwrites the destination if it already exists.
+* `-ignorecrc` : Skip CRC checks on the file(s) downloaded.
+* `crc`: write CRC checksums for the files downloaded.
--- End diff --

`crc` should be `-crc`


> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13655) document object store use with fs shell and distcp

2016-09-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522961#comment-15522961
 ] 

ASF GitHub Bot commented on HADOOP-13655:
-

GitHub user steveloughran opened a pull request:

https://github.com/apache/hadoop/pull/131

HADOOP-13655

Patch of filesystem shell & distcp docs to cover object stores. Also 
updated some references in filesystem/index.md which were out of date

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/hadoop 
s3/HADOOP-13655-shell-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #131


commit 0a76336a0515e136474cc62b7e1b97aa175f7d10
Author: Steve Loughran 
Date:   2016-09-26T12:44:59Z

HADOOP-13655 patch 001 of docs




> document object store use with fs shell and distcp
> --
>
> Key: HADOOP-13655
> URL: https://issues.apache.org/jira/browse/HADOOP-13655
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs, fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> There's no specific docs for working with object stores from the {{hadoop 
> fs}} shell or in distcp; people either suffer from this (performance, 
> billing), or learn through trial and error what to do.
> Add a section in both fs shell and distcp docs covering use with object 
> stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org