[15/50] [abbrv] hbase git commit: HBASE-20337 Update the doc on how to setup shortcircuit reads; its stale

2018-04-08 Thread zhangduo
HBASE-20337 Update the doc on how to setup shortcircuit reads; its stale


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/d60decd9
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/d60decd9
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/d60decd9

Branch: refs/heads/HBASE-19064
Commit: d60decd959d4556caa54a3f355e246372d0147e5
Parents: 0c0fe05
Author: Michael Stack 
Authored: Tue Apr 3 10:27:38 2018 -0700
Committer: Michael Stack 
Committed: Wed Apr 4 11:20:58 2018 -0700

--
 src/main/asciidoc/_chapters/schema_design.adoc | 26 ++---
 1 file changed, 23 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/hbase/blob/d60decd9/src/main/asciidoc/_chapters/schema_design.adoc
--
diff --git a/src/main/asciidoc/_chapters/schema_design.adoc 
b/src/main/asciidoc/_chapters/schema_design.adoc
index 4cd7656..12d449b 100644
--- a/src/main/asciidoc/_chapters/schema_design.adoc
+++ b/src/main/asciidoc/_chapters/schema_design.adoc
@@ -1148,16 +1148,36 @@ Detect regionserver failure as fast as reasonable. Set 
the following parameters:
 - `dfs.namenode.avoid.read.stale.datanode = true`
 - `dfs.namenode.avoid.write.stale.datanode = true`
 
+[[shortcircuit.reads]]
 ===  Optimize on the Server Side for Low Latency
-
-* Skip the network for local blocks. In `hbase-site.xml`, set the following 
parameters:
+Skip the network for local blocks when the RegionServer goes to read from HDFS 
by exploiting HDFS's
+link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html[Short-Circuit
 Local Reads] facility.
+Note how setup must be done both at the datanode and on the dfsclient ends of 
the conneciton -- i.e. at the RegionServer
+and how both ends need to have loaded the hadoop native `.so` library.
+After configuring your hadoop setting _dfs.client.read.shortcircuit_ to _true_ 
and configuring
+the _dfs.domain.socket.path_ path for the datanode and dfsclient to share and 
restarting, next configure
+the regionserver/dfsclient side.
+
+* In `hbase-site.xml`, set the following parameters:
 - `dfs.client.read.shortcircuit = true`
-- `dfs.client.read.shortcircuit.buffer.size = 131072` (Important to avoid OOME)
+- `dfs.client.read.shortcircuit.skip.checksum = true` so we don't double 
checksum (HBase does its own checksumming to save on i/os. See 
<> for more on this. 
+- `dfs.domain.socket.path` to match what was set for the datanodes.
+- `dfs.client.read.shortcircuit.buffer.size = 131072` Important to avoid OOME 
-- hbase has a default it uses if unset, see 
`hbase.dfs.client.read.shortcircuit.buffer.size`; its default is 131072.
 * Ensure data locality. In `hbase-site.xml`, set 
`hbase.hstore.min.locality.to.skip.major.compact = 0.7` (Meaning that 0.7 \<= n 
\<= 1)
 * Make sure DataNodes have enough handlers for block transfers. In 
`hdfs-site.xml`, set the following parameters:
 - `dfs.datanode.max.xcievers >= 8192`
 - `dfs.datanode.handler.count =` number of spindles
 
+Check the RegionServer logs after restart. You should only see complaint if 
misconfiguration.
+Otherwise, shortcircuit read operates quietly in background. It does not 
provide metrics so
+no optics on how effective it is but read latencies should show a marked 
improvement, especially if
+good data locality, lots of random reads, and dataset is larger than available 
cache.
+
+For more on short-circuit reads, see Colin's old blog on rollout,
+link:http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/[How
 Improved Short-Circuit Local Reads Bring Better Performance and Security to 
Hadoop].
+The link:https://issues.apache.org/jira/browse/HDFS-347[HDFS-347] issue also 
makes for an
+interesting read showing the HDFS community at its best (caveat a few 
comments).
+
 ===  JVM Tuning
 
   Tune JVM GC for low collection latencies



[16/50] [abbrv] hbase git commit: HBASE-20337 Update the doc on how to setup shortcircuit reads; its stale; ADDENDUM

2018-04-08 Thread zhangduo
HBASE-20337 Update the doc on how to setup shortcircuit reads; its stale; 
ADDENDUM


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/8bc72347
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/8bc72347
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/8bc72347

Branch: refs/heads/HBASE-19064
Commit: 8bc723477b60ce1ed0a71081630459621cb0f284
Parents: d60decd
Author: Michael Stack 
Authored: Wed Apr 4 11:25:25 2018 -0700
Committer: Michael Stack 
Committed: Wed Apr 4 11:25:25 2018 -0700

--
 src/main/asciidoc/_chapters/schema_design.adoc | 5 +
 1 file changed, 5 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/hbase/blob/8bc72347/src/main/asciidoc/_chapters/schema_design.adoc
--
diff --git a/src/main/asciidoc/_chapters/schema_design.adoc 
b/src/main/asciidoc/_chapters/schema_design.adoc
index 12d449b..a25b85e 100644
--- a/src/main/asciidoc/_chapters/schema_design.adoc
+++ b/src/main/asciidoc/_chapters/schema_design.adoc
@@ -1173,6 +1173,11 @@ Otherwise, shortcircuit read operates quietly in 
background. It does not provide
 no optics on how effective it is but read latencies should show a marked 
improvement, especially if
 good data locality, lots of random reads, and dataset is larger than available 
cache.
 
+Other advanced configurations that you might play with, especially if 
shortcircuit functionality
+is complaining in the logs,  include 
`dfs.client.read.shortcircuit.streams.cache.size` and
+`dfs.client.socketcache.capacity`. Documentation is sparse on these options. 
You'll have to
+read source code.
+
 For more on short-circuit reads, see Colin's old blog on rollout,
 
link:http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/[How
 Improved Short-Circuit Local Reads Bring Better Performance and Security to 
Hadoop].
 The link:https://issues.apache.org/jira/browse/HDFS-347[HDFS-347] issue also 
makes for an



hbase git commit: HBASE-20337 Update the doc on how to setup shortcircuit reads; its stale; ADDENDUM

2018-04-04 Thread stack
Repository: hbase
Updated Branches:
  refs/heads/master d60decd95 -> 8bc723477


HBASE-20337 Update the doc on how to setup shortcircuit reads; its stale; 
ADDENDUM


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/8bc72347
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/8bc72347
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/8bc72347

Branch: refs/heads/master
Commit: 8bc723477b60ce1ed0a71081630459621cb0f284
Parents: d60decd
Author: Michael Stack 
Authored: Wed Apr 4 11:25:25 2018 -0700
Committer: Michael Stack 
Committed: Wed Apr 4 11:25:25 2018 -0700

--
 src/main/asciidoc/_chapters/schema_design.adoc | 5 +
 1 file changed, 5 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/hbase/blob/8bc72347/src/main/asciidoc/_chapters/schema_design.adoc
--
diff --git a/src/main/asciidoc/_chapters/schema_design.adoc 
b/src/main/asciidoc/_chapters/schema_design.adoc
index 12d449b..a25b85e 100644
--- a/src/main/asciidoc/_chapters/schema_design.adoc
+++ b/src/main/asciidoc/_chapters/schema_design.adoc
@@ -1173,6 +1173,11 @@ Otherwise, shortcircuit read operates quietly in 
background. It does not provide
 no optics on how effective it is but read latencies should show a marked 
improvement, especially if
 good data locality, lots of random reads, and dataset is larger than available 
cache.
 
+Other advanced configurations that you might play with, especially if 
shortcircuit functionality
+is complaining in the logs,  include 
`dfs.client.read.shortcircuit.streams.cache.size` and
+`dfs.client.socketcache.capacity`. Documentation is sparse on these options. 
You'll have to
+read source code.
+
 For more on short-circuit reads, see Colin's old blog on rollout,
 
link:http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/[How
 Improved Short-Circuit Local Reads Bring Better Performance and Security to 
Hadoop].
 The link:https://issues.apache.org/jira/browse/HDFS-347[HDFS-347] issue also 
makes for an



hbase git commit: HBASE-20337 Update the doc on how to setup shortcircuit reads; its stale

2018-04-04 Thread stack
Repository: hbase
Updated Branches:
  refs/heads/master 0c0fe05bc -> d60decd95


HBASE-20337 Update the doc on how to setup shortcircuit reads; its stale


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/d60decd9
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/d60decd9
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/d60decd9

Branch: refs/heads/master
Commit: d60decd959d4556caa54a3f355e246372d0147e5
Parents: 0c0fe05
Author: Michael Stack 
Authored: Tue Apr 3 10:27:38 2018 -0700
Committer: Michael Stack 
Committed: Wed Apr 4 11:20:58 2018 -0700

--
 src/main/asciidoc/_chapters/schema_design.adoc | 26 ++---
 1 file changed, 23 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/hbase/blob/d60decd9/src/main/asciidoc/_chapters/schema_design.adoc
--
diff --git a/src/main/asciidoc/_chapters/schema_design.adoc 
b/src/main/asciidoc/_chapters/schema_design.adoc
index 4cd7656..12d449b 100644
--- a/src/main/asciidoc/_chapters/schema_design.adoc
+++ b/src/main/asciidoc/_chapters/schema_design.adoc
@@ -1148,16 +1148,36 @@ Detect regionserver failure as fast as reasonable. Set 
the following parameters:
 - `dfs.namenode.avoid.read.stale.datanode = true`
 - `dfs.namenode.avoid.write.stale.datanode = true`
 
+[[shortcircuit.reads]]
 ===  Optimize on the Server Side for Low Latency
-
-* Skip the network for local blocks. In `hbase-site.xml`, set the following 
parameters:
+Skip the network for local blocks when the RegionServer goes to read from HDFS 
by exploiting HDFS's
+link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html[Short-Circuit
 Local Reads] facility.
+Note how setup must be done both at the datanode and on the dfsclient ends of 
the conneciton -- i.e. at the RegionServer
+and how both ends need to have loaded the hadoop native `.so` library.
+After configuring your hadoop setting _dfs.client.read.shortcircuit_ to _true_ 
and configuring
+the _dfs.domain.socket.path_ path for the datanode and dfsclient to share and 
restarting, next configure
+the regionserver/dfsclient side.
+
+* In `hbase-site.xml`, set the following parameters:
 - `dfs.client.read.shortcircuit = true`
-- `dfs.client.read.shortcircuit.buffer.size = 131072` (Important to avoid OOME)
+- `dfs.client.read.shortcircuit.skip.checksum = true` so we don't double 
checksum (HBase does its own checksumming to save on i/os. See 
<> for more on this. 
+- `dfs.domain.socket.path` to match what was set for the datanodes.
+- `dfs.client.read.shortcircuit.buffer.size = 131072` Important to avoid OOME 
-- hbase has a default it uses if unset, see 
`hbase.dfs.client.read.shortcircuit.buffer.size`; its default is 131072.
 * Ensure data locality. In `hbase-site.xml`, set 
`hbase.hstore.min.locality.to.skip.major.compact = 0.7` (Meaning that 0.7 \<= n 
\<= 1)
 * Make sure DataNodes have enough handlers for block transfers. In 
`hdfs-site.xml`, set the following parameters:
 - `dfs.datanode.max.xcievers >= 8192`
 - `dfs.datanode.handler.count =` number of spindles
 
+Check the RegionServer logs after restart. You should only see complaint if 
misconfiguration.
+Otherwise, shortcircuit read operates quietly in background. It does not 
provide metrics so
+no optics on how effective it is but read latencies should show a marked 
improvement, especially if
+good data locality, lots of random reads, and dataset is larger than available 
cache.
+
+For more on short-circuit reads, see Colin's old blog on rollout,
+link:http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/[How
 Improved Short-Circuit Local Reads Bring Better Performance and Security to 
Hadoop].
+The link:https://issues.apache.org/jira/browse/HDFS-347[HDFS-347] issue also 
makes for an
+interesting read showing the HDFS community at its best (caveat a few 
comments).
+
 ===  JVM Tuning
 
   Tune JVM GC for low collection latencies