[2/5] hadoop git commit: HDFS-10678. Documenting NNThroughputBenchmark tool. (Contributed by Mingliang Liu)

2016-08-15 Thread liuml07
HDFS-10678. Documenting NNThroughputBenchmark tool. (Contributed by Mingliang 
Liu)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/f9a7e590
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/f9a7e590
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/f9a7e590

Branch: refs/heads/branch-2
Commit: f9a7e59066384c19b482231a1c1ed40a5324d829
Parents: e36a913
Author: Mingliang Liu 
Authored: Mon Aug 15 20:22:14 2016 -0700
Committer: Mingliang Liu 
Committed: Mon Aug 15 20:37:55 2016 -0700

--
 .../src/site/markdown/Benchmarking.md   | 106 +++
 .../server/namenode/NNThroughputBenchmark.java  |  32 +-
 hadoop-project/src/site/site.xml|   1 +
 3 files changed, 110 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/hadoop/blob/f9a7e590/hadoop-common-project/hadoop-common/src/site/markdown/Benchmarking.md
--
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/Benchmarking.md 
b/hadoop-common-project/hadoop-common/src/site/markdown/Benchmarking.md
new file mode 100644
index 000..678dcee
--- /dev/null
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/Benchmarking.md
@@ -0,0 +1,106 @@
+
+
+# Hadoop Benchmarking
+
+
+
+This page is to discuss benchmarking Hadoop using tools it provides.
+
+## NNThroughputBenchmark
+
+### Overview
+
+**NNThroughputBenchmark**, as its name indicates, is a name-node throughput 
benchmark, which runs a series of client threads on a single node against a 
name-node. If no name-node is configured, it will firstly start a name-node in 
the same process (_standalone mode_), in which case each client repetitively 
performs the same operation by directly calling the respective name-node 
methods. Otherwise, the benchmark will perform the operations against a remote 
name-node via client protocol RPCs (_remote mode_). Either way, all clients are 
running locally in a single process rather than remotely across different 
nodes. The reason is to avoid communication overhead caused by RPC connections 
and serialization, and thus reveal the upper bound of pure name-node 
performance.
+
+The benchmark first generates inputs for each thread so that the input 
generation overhead does not effect the resulting statistics. The number of 
operations performed by threads is practically the same. Precisely, the 
difference between the number of operations performed by any two threads does 
not exceed 1. Then the benchmark executes the specified number of operations 
using the specified number of threads and outputs the resulting stats by 
measuring the number of operations performed by the name-node per second.
+
+### Commands
+
+The general command line syntax is:
+
+`hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark 
[genericOptions] [commandOptions]`
+
+ Generic Options
+
+This benchmark honors the [Hadoop command-line Generic 
Options](CommandsManual.html#Generic_Options) to alter its behavior. The 
benchmark, as other tools, will rely on the `fs.defaultFS` config, which is 
overridable by `-fs` command option, to run standalone mode or remote mode. If 
the `fs.defaultFS` scheme is not specified or is `file` (local), the benchmark 
will run in _standalone mode_. Specially, the _remote_ name-node config 
`dfs.namenode.fs-limits.min-block-size` should be set as 16 while in 
_standalone mode_ the benchmark turns off minimum block size verification for 
its internal name-node.
+
+ Command Options
+
+The following are all supported command options:
+
+| COMMAND\_OPTION| Description |
+|: |: |
+|`-op` | Specify the operation. This option must be provided and should be the 
first option. |
+|`-logLevel` | Specify the logging level when the benchmark runs. The default 
logging level is ERROR. |
+|`-UGCacheRefreshCount` | After every specified number of operations, the 
benchmark purges the name-node's user group cache. By default the refresh is 
never called. |
+|`-keepResults` | If specified, do not clean up the name-space after 
execution. By default the name-space will be removed after test. |
+
+# Operations Supported
+
+Following are all the operations supported along with their respective 
operation-specific parameters (all optional) and default values.
+
+| OPERATION\_OPTION| Operation-specific parameters |
+|: |: |
+|`all` | _options for other operations_ |
+|`create` | [`-threads 3`] [`-files 10`] [`-filesPerDir 4`] [`-close`] |
+|`mkdirs` | [`-threads 3`] [`-dirs 10`] [`-dirsPerDir 2`] |
+|`open` | [`-threads 3`] [`-files 10`] [`-filesPerDir 4`] [`-useExisting`] |
+|`delete` | [`-threads 3`] [`-files 10`] [`-filesPerDir 4`] 

[2/5] hadoop git commit: HDFS-10678. Documenting NNThroughputBenchmark tool. (Contributed by Mingliang Liu)

2016-08-15 Thread liuml07
HDFS-10678. Documenting NNThroughputBenchmark tool. (Contributed by Mingliang 
Liu)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/0b934c37
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/0b934c37
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/0b934c37

Branch: refs/heads/branch-2.8
Commit: 0b934c375ef957ae635a893a6c49fa89068c0227
Parents: 6471ec3
Author: Mingliang Liu 
Authored: Mon Aug 15 20:22:14 2016 -0700
Committer: Mingliang Liu 
Committed: Mon Aug 15 20:46:28 2016 -0700

--
 .../src/site/markdown/Benchmarking.md   | 106 +++
 .../server/namenode/NNThroughputBenchmark.java  |  32 +-
 hadoop-project/src/site/site.xml|   1 +
 3 files changed, 110 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/hadoop/blob/0b934c37/hadoop-common-project/hadoop-common/src/site/markdown/Benchmarking.md
--
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/Benchmarking.md 
b/hadoop-common-project/hadoop-common/src/site/markdown/Benchmarking.md
new file mode 100644
index 000..678dcee
--- /dev/null
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/Benchmarking.md
@@ -0,0 +1,106 @@
+
+
+# Hadoop Benchmarking
+
+
+
+This page is to discuss benchmarking Hadoop using tools it provides.
+
+## NNThroughputBenchmark
+
+### Overview
+
+**NNThroughputBenchmark**, as its name indicates, is a name-node throughput 
benchmark, which runs a series of client threads on a single node against a 
name-node. If no name-node is configured, it will firstly start a name-node in 
the same process (_standalone mode_), in which case each client repetitively 
performs the same operation by directly calling the respective name-node 
methods. Otherwise, the benchmark will perform the operations against a remote 
name-node via client protocol RPCs (_remote mode_). Either way, all clients are 
running locally in a single process rather than remotely across different 
nodes. The reason is to avoid communication overhead caused by RPC connections 
and serialization, and thus reveal the upper bound of pure name-node 
performance.
+
+The benchmark first generates inputs for each thread so that the input 
generation overhead does not effect the resulting statistics. The number of 
operations performed by threads is practically the same. Precisely, the 
difference between the number of operations performed by any two threads does 
not exceed 1. Then the benchmark executes the specified number of operations 
using the specified number of threads and outputs the resulting stats by 
measuring the number of operations performed by the name-node per second.
+
+### Commands
+
+The general command line syntax is:
+
+`hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark 
[genericOptions] [commandOptions]`
+
+ Generic Options
+
+This benchmark honors the [Hadoop command-line Generic 
Options](CommandsManual.html#Generic_Options) to alter its behavior. The 
benchmark, as other tools, will rely on the `fs.defaultFS` config, which is 
overridable by `-fs` command option, to run standalone mode or remote mode. If 
the `fs.defaultFS` scheme is not specified or is `file` (local), the benchmark 
will run in _standalone mode_. Specially, the _remote_ name-node config 
`dfs.namenode.fs-limits.min-block-size` should be set as 16 while in 
_standalone mode_ the benchmark turns off minimum block size verification for 
its internal name-node.
+
+ Command Options
+
+The following are all supported command options:
+
+| COMMAND\_OPTION| Description |
+|: |: |
+|`-op` | Specify the operation. This option must be provided and should be the 
first option. |
+|`-logLevel` | Specify the logging level when the benchmark runs. The default 
logging level is ERROR. |
+|`-UGCacheRefreshCount` | After every specified number of operations, the 
benchmark purges the name-node's user group cache. By default the refresh is 
never called. |
+|`-keepResults` | If specified, do not clean up the name-space after 
execution. By default the name-space will be removed after test. |
+
+# Operations Supported
+
+Following are all the operations supported along with their respective 
operation-specific parameters (all optional) and default values.
+
+| OPERATION\_OPTION| Operation-specific parameters |
+|: |: |
+|`all` | _options for other operations_ |
+|`create` | [`-threads 3`] [`-files 10`] [`-filesPerDir 4`] [`-close`] |
+|`mkdirs` | [`-threads 3`] [`-dirs 10`] [`-dirsPerDir 2`] |
+|`open` | [`-threads 3`] [`-files 10`] [`-filesPerDir 4`] [`-useExisting`] |
+|`delete` | [`-threads 3`] [`-files 10`] [`-filesPerDir 4`]