[03/11] accumulo git commit: ACCUMULO-4532 Improve documentation of examples

mwalch Tue, 06 Dec 2016 11:16:31 -0800

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/combiner.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/combiner.md 
b/docs/src/main/resources/examples/combiner.md
new file mode 100644
index 0000000..03841d3
--- /dev/null
+++ b/docs/src/main/resources/examples/combiner.md
@@ -0,0 +1,72 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Combiner Example
+---
+
+This tutorial uses the following Java class, which can be found in 
org.apache.accumulo.examples.simple.combiner in the examples-simple module:
+
+ * StatsCombiner.java - a combiner that calculates max, min, sum, and count
+
+This is a simple combiner example. To build this example run maven and then
+copy the produced jar into the accumulo lib dir. This is already done in the
+tar distribution.
+
+    $ bin/accumulo shell -u username
+    Enter current password for 'username'@'instance': ***
+
+    Shell - Apache Accumulo Interactive Shell
+    -
+    - version: 1.5.0
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    -
+    - type 'help' for a list of available commands
+    -
+    username@instance> createtable runners
+    username@instance runners> setiter -t runners -p 10 -scan -minc -majc -n 
decStats -class org.apache.accumulo.examples.simple.combiner.StatsCombiner
+    Combiner that keeps track of min, max, sum, and count
+    ----------> set StatsCombiner parameter all, set to true to apply Combiner 
to every column, otherwise leave blank. if true, columns option will be 
ignored.:
+    ----------> set StatsCombiner parameter columns, <col fam>[:<col 
qual>]{,<col fam>[:<col qual>]} escape non aplhanum chars using %<hex>.: stat
+    ----------> set StatsCombiner parameter radix, radix/base of the numbers: 
10
+    username@instance runners> setiter -t runners -p 11 -scan -minc -majc -n 
hexStats -class org.apache.accumulo.examples.simple.combiner.StatsCombiner
+    Combiner that keeps track of min, max, sum, and count
+    ----------> set StatsCombiner parameter all, set to true to apply Combiner 
to every column, otherwise leave blank. if true, columns option will be 
ignored.:
+    ----------> set StatsCombiner parameter columns, <col fam>[:<col 
qual>]{,<col fam>[:<col qual>]} escape non aplhanum chars using %<hex>.: hstat
+    ----------> set StatsCombiner parameter radix, radix/base of the numbers: 
16
+    username@instance runners> insert 123456 name first Joe
+    username@instance runners> insert 123456 stat marathon 240
+    username@instance runners> scan
+    123456 name:first []    Joe
+    123456 stat:marathon []    240,240,240,1
+    username@instance runners> insert 123456 stat marathon 230
+    username@instance runners> insert 123456 stat marathon 220
+    username@instance runners> scan
+    123456 name:first []    Joe
+    123456 stat:marathon []    220,240,690,3
+    username@instance runners> insert 123456 hstat virtualMarathon 6a
+    username@instance runners> insert 123456 hstat virtualMarathon 6b
+    username@instance runners> scan
+    123456 hstat:virtualMarathon []    6a,6b,d5,2
+    123456 name:first []    Joe
+    123456 stat:marathon []    220,240,690,3
+
+In this example a table is created and the example stats combiner is applied to
+the column family stat and hstat. The stats combiner computes min,max,sum, and
+count. It can be configured to use a different base or radix. In the example
+above the column family stat is configured for base 10 and the column family
+hstat is configured for base 16.


http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/compactionStrategy.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/compactionStrategy.md 
b/docs/src/main/resources/examples/compactionStrategy.md
new file mode 100644
index 0000000..642c3ea
--- /dev/null
+++ b/docs/src/main/resources/examples/compactionStrategy.md
@@ -0,0 +1,67 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Customizing the Compaction Strategy
+---
+
+This tutorial uses the following Java classes, which can be found in 
org.apache.accumulo.tserver.compaction: 
+
+ * DefaultCompactionStrategy.java - determines which files to compact based on 
table.compaction.major.ratio and table.file.max
+ * EverythingCompactionStrategy.java - compacts all files
+ * SizeLimitCompactionStrategy.java - compacts files no bigger than 
table.majc.compaction.strategy.opts.sizeLimit
+ * TwoTierCompactionStrategy.java - uses default compression for smaller files 
and table.majc.compaction.strategy.opts.file.large.compress.type for larger 
files
+
+This is an example of how to configure a compaction strategy. By default 
Accumulo will always use the DefaultCompactionStrategy, unless 
+these steps are taken to change the configuration.  Use the strategy and 
settings that best fits your Accumulo setup. This example shows
+how to configure and test one of the more complicated strategies, the 
TwoTierCompactionStrategy. Note that this example requires hadoop
+native libraries built with snappy in order to use snappy compression.
+
+To begin, run the command to create a table for testing:
+
+    $ ./bin/accumulo shell -u root -p secret -e "createtable test1"
+
+The command below sets the compression for smaller files and minor compactions 
for that table.
+
+    $ ./bin/accumulo shell -u root -p secret -e "config -s 
table.file.compress.type=snappy -t test1"
+
+The commands below will configure the TwoTierCompactionStrategy to use gz 
compression for files larger than 1M. 
+
+    $ ./bin/accumulo shell -u root -p secret -e "config -s 
table.majc.compaction.strategy.opts.file.large.compress.threshold=1M -t test1"
+    $ ./bin/accumulo shell -u root -p secret -e "config -s 
table.majc.compaction.strategy.opts.file.large.compress.type=gz -t test1"
+    $ ./bin/accumulo shell -u root -p secret -e "config -s 
table.majc.compaction.strategy=org.apache.accumulo.tserver.compaction.TwoTierCompactionStrategy
 -t test1"
+
+Generate some data and files in order to test the strategy:
+
+    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.SequentialBatchWriter -i instance17 
-z localhost:2181 -u root -p secret -t test1 --start 0 --num 10000 --size 50 
--batchMemory 20M --batchLatency 500 --batchThreads 20
+    $ ./bin/accumulo shell -u root -p secret -e "flush -t test1"
+    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.SequentialBatchWriter -i instance17 
-z localhost:2181 -u root -p secret -t test1 --start 0 --num 11000 --size 50 
--batchMemory 20M --batchLatency 500 --batchThreads 20
+    $ ./bin/accumulo shell -u root -p secret -e "flush -t test1"
+    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.SequentialBatchWriter -i instance17 
-z localhost:2181 -u root -p secret -t test1 --start 0 --num 12000 --size 50 
--batchMemory 20M --batchLatency 500 --batchThreads 20
+    $ ./bin/accumulo shell -u root -p secret -e "flush -t test1"
+    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.SequentialBatchWriter -i instance17 
-z localhost:2181 -u root -p secret -t test1 --start 0 --num 13000 --size 50 
--batchMemory 20M --batchLatency 500 --batchThreads 20
+    $ ./bin/accumulo shell -u root -p secret -e "flush -t test1"
+
+View the tserver log in <accumulo_home>/logs for the compaction and find the 
name of the <rfile> that was compacted for your table. Print info about this 
file using the PrintInfo tool:
+
+    $ ./bin/accumulo rfile-info <rfile>
+
+Details about the rfile will be printed and the compression type should match 
the type used in the compaction...
+Meta block     : RFile.index
+      Raw size             : 512 bytes
+      Compressed size      : 278 bytes
+      Compression type     : gz
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/constraints.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/constraints.md 
b/docs/src/main/resources/examples/constraints.md
new file mode 100644
index 0000000..4f23aab
--- /dev/null
+++ b/docs/src/main/resources/examples/constraints.md
@@ -0,0 +1,56 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Constraints Example
+---
+
+This tutorial uses the following Java classes, which can be found in 
org.apache.accumulo.examples.simple.constraints in the examples-simple module:
+
+ * AlphaNumKeyConstraint.java - a constraint that requires alphanumeric keys
+ * NumericValueConstraint.java - a constraint that requires numeric string 
values
+
+This an example of how to create a table with constraints. Below a table is
+created with two example constraints. One constraints does not allow non alpha
+numeric keys. The other constraint does not allow non numeric values. Two
+inserts that violate these constraints are attempted and denied. The scan at
+the end shows the inserts were not allowed.
+
+    $ ./bin/accumulo shell -u username -p password
+
+    Shell - Apache Accumulo Interactive Shell
+    -
+    - version: 1.5.0
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    -
+    - type 'help' for a list of available commands
+    -
+    username@instance> createtable testConstraints
+    username@instance testConstraints> constraint -a 
org.apache.accumulo.examples.simple.constraints.NumericValueConstraint
+    username@instance testConstraints> constraint -a 
org.apache.accumulo.examples.simple.constraints.AlphaNumKeyConstraint
+    username@instance testConstraints> insert r1 cf1 cq1 1111
+    username@instance testConstraints> insert r1 cf1 cq1 ABC
+      Constraint Failures:
+          
ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.NumericValueConstraint,
 violationCode:1, violationDescription:Value is not numeric, 
numberOfViolatingMutations:1)
+    username@instance testConstraints> insert r1! cf1 cq1 ABC
+      Constraint Failures:
+          
ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.NumericValueConstraint,
 violationCode:1, violationDescription:Value is not numeric, 
numberOfViolatingMutations:1)
+          
ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.AlphaNumKeyConstraint,
 violationCode:1, violationDescription:Row was not alpha numeric, 
numberOfViolatingMutations:1)
+    username@instance testConstraints> scan
+    r1 cf1:cq1 []    1111
+    username@instance testConstraints>
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/dirlist.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/dirlist.md 
b/docs/src/main/resources/examples/dirlist.md
new file mode 100644
index 0000000..1b6a15c
--- /dev/null
+++ b/docs/src/main/resources/examples/dirlist.md
@@ -0,0 +1,118 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo File System Archive
+---
+
+This example stores filesystem information in accumulo. The example stores the 
information in the following three tables. More information about the table 
structures can be found at the end of this document.
+
+ * directory table : This table stores information about the filesystem 
directory structure.
+ * index table     : This table stores a file name index. It can be used to 
quickly find files with given name, suffix, or prefix.
+ * data table      : This table stores the file data. File with duplicate data 
are only stored once.
+
+This example shows how to use Accumulo to store a file system history. It has 
the following classes:
+
+ * Ingest.java - Recursively lists the files and directories under a given 
path, ingests their names and file info into one Accumulo table, indexes the 
file names in a separate table, and the file data into a third table.
+ * QueryUtil.java - Provides utility methods for getting the info for a file, 
listing the contents of a directory, and performing single wild card searches 
on file or directory names.
+ * Viewer.java - Provides a GUI for browsing the file system information 
stored in Accumulo.
+ * FileCount.java - Computes recursive counts over file system information and 
stores them back into the same Accumulo table.
+
+To begin, ingest some data with Ingest.java.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.Ingest -i 
instance -z zookeepers -u username -p password --vis exampleVis --chunkSize 
100000 /local/username/workspace
+
+This may take some time if there are large files in the 
/local/username/workspace directory. If you use 0 instead of 100000 on the 
command line, the ingest will run much faster, but it will not put any file 
data into Accumulo (the dataTable will be empty).
+Note that running this example will create tables dirTable, indexTable, and 
dataTable in Accumulo that you should delete when you have completed the 
example.
+If you modify a file or add new files in the directory ingested (e.g. 
/local/username/workspace), you can run Ingest again to add new information 
into the Accumulo tables.
+
+To browse the data ingested, use Viewer.java. Be sure to give the "username" 
user the authorizations to see the data (in this case, run
+
+    $ ./bin/accumulo shell -u root -e 'setauths -u username -s exampleVis'
+
+then run the Viewer:
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.Viewer -i 
instance -z zookeepers -u username -p password -t dirTable --dataTable 
dataTable --auths exampleVis --path /local/username/workspace
+
+To list the contents of specific directories, use QueryUtil.java.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i 
instance -z zookeepers -u username -p password -t dirTable --auths exampleVis 
--path /local/username
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i 
instance -z zookeepers -u username -p password -t dirTable --auths exampleVis 
--path /local/username/workspace
+
+To perform searches on file or directory names, also use QueryUtil.java. 
Search terms must contain no more than one wild card and cannot contain "/".
+*Note* these queries run on the _indexTable_ table instead of the dirTable 
table.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i 
instance -z zookeepers -u username -p password -t indexTable --auths exampleVis 
--path filename --search
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i 
instance -z zookeepers -u username -p password -t indexTable --auths exampleVis 
--path 'filename*' --search
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i 
instance -z zookeepers -u username -p password -t indexTable --auths exampleVis 
--path '*jar' --search
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i 
instance -z zookeepers -u username -p password -t indexTable --auths exampleVis 
--path 'filename*jar' --search
+
+To count the number of direct children (directories and files) and descendants 
(children and children's descendants, directories and files), run the FileCount 
over the dirTable table.
+The results are written back to the same table. FileCount reads from and 
writes to Accumulo. This requires scan authorizations for the read and a 
visibility for the data written.
+In this example, the authorizations and visibility are set to the same value, 
exampleVis. See the [visibility example][vis] for more information on 
visibility and authorizations.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.FileCount -i 
instance -z zookeepers -u username -p password -t dirTable --auths exampleVis
+
+## Directory Table
+
+Here is a illustration of what data looks like in the directory table:
+
+    row colf:colq [vis]        value
+    000 dir:exec [exampleVis]    true
+    000 dir:hidden [exampleVis]    false
+    000 dir:lastmod [exampleVis]    1291996886000
+    000 dir:length [exampleVis]    1666
+    001/local dir:exec [exampleVis]    true
+    001/local dir:hidden [exampleVis]    false
+    001/local dir:lastmod [exampleVis]    1304945270000
+    001/local dir:length [exampleVis]    272
+    002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:exec [exampleVis]  
  false
+    002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:hidden 
[exampleVis]    false
+    002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:lastmod 
[exampleVis]    1308746481000
+    002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:length 
[exampleVis]    9192
+    002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:md5 [exampleVis]   
 274af6419a3c4c4a259260ac7017cbf1
+
+The rows are of the form depth + path, where depth is the number of slashes 
("/") in the path padded to 3 digits. This is so that all the children of a 
directory appear as consecutive keys in Accumulo; without the depth, you would 
for example see all the subdirectories of /local before you saw /usr.
+For directories the column family is "dir". For files the column family is 
Long.MAX_VALUE - lastModified in bytes rather than string format so that newer 
versions sort earlier.
+
+## Index Table
+
+Here is an illustration of what data looks like in the index table:
+
+    row colf:colq [vis]
+    fAccumulo.README i:002/local/Accumulo.README [exampleVis]
+    flocal i:001/local [exampleVis]
+    rEMDAER.olumuccA i:002/local/Accumulo.README [exampleVis]
+    rlacol i:001/local [exampleVis]
+
+The values of the index table are null. The rows are of the form "f" + 
filename or "r" + reverse file name. This is to enable searches with wildcards 
at the beginning, middle, or end.
+
+## Data Table
+
+Here is an illustration of what data looks like in the data table:
+
+    row colf:colq [vis]        value
+    274af6419a3c4c4a259260ac7017cbf1 
refs:e77276a2b56e5c15b540eaae32b12c69\x00filext [exampleVis]    README
+    274af6419a3c4c4a259260ac7017cbf1 
refs:e77276a2b56e5c15b540eaae32b12c69\x00name [exampleVis]    
/local/Accumulo.README
+    274af6419a3c4c4a259260ac7017cbf1 ~chunk:\x00\x0FB@\x00\x00\x00\x00 
[exampleVis]    
*******************************************************************************\x0A1.
 Building\x0A\x0AIn the normal tarball release of accumulo, [truncated]
+    274af6419a3c4c4a259260ac7017cbf1 ~chunk:\x00\x0FB@\x00\x00\x00\x01 
[exampleVis]
+
+The rows are the md5 hash of the file. Some column family : column qualifier 
pairs are "refs" : hash of file name + null byte + property name, in which case 
the value is property value. There can be multiple references to the same file 
which are distinguished by the hash of the file name.
+Other column family : column qualifier pairs are "~chunk" : chunk size in 
bytes + chunk number in bytes, in which case the value is the bytes for that 
chunk of the file. There is an end of file data marker whose chunk number is 
the number of chunks for the file and whose value is empty.
+
+There may exist multiple copies of the same file (with the same md5 hash) with 
different chunk sizes or different visibilities. There is an iterator that can 
be set on the data table that combines these copies into a single copy with a 
visibility taken from the visibilities of the file references, e.g. (vis from 
ref1)|(vis from ref2).
+
+[vis]: visibility.md

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/export.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/export.md 
b/docs/src/main/resources/examples/export.md
new file mode 100644
index 0000000..beb7b99
--- /dev/null
+++ b/docs/src/main/resources/examples/export.md
@@ -0,0 +1,93 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Export/Import Example
+---
+
+Accumulo provides a mechanism to export and import tables. This example shows
+how to use this feature.
+
+The shell session below shows creating a table, inserting data, and exporting
+the table. A table must be offline to export it, and it should remain offline
+for the duration of the distcp. An easy way to take a table offline without
+interuppting access to it is to clone it and take the clone offline.
+
+    root@test15> createtable table1
+    root@test15 table1> insert a cf1 cq1 v1
+    root@test15 table1> insert h cf1 cq1 v2
+    root@test15 table1> insert z cf1 cq1 v3
+    root@test15 table1> insert z cf1 cq2 v4
+    root@test15 table1> addsplits -t table1 b r
+    root@test15 table1> scan
+    a cf1:cq1 []    v1
+    h cf1:cq1 []    v2
+    z cf1:cq1 []    v3
+    z cf1:cq2 []    v4
+    root@test15> config -t table1 -s table.split.threshold=100M
+    root@test15 table1> clonetable table1 table1_exp
+    root@test15 table1> offline table1_exp
+    root@test15 table1> exporttable -t table1_exp /tmp/table1_export
+    root@test15 table1> quit
+
+After executing the export command, a few files are created in the hdfs dir.
+One of the files is a list of files to distcp as shown below.
+
+    $ hadoop fs -ls /tmp/table1_export
+    Found 2 items
+    -rw-r--r--   3 user supergroup        162 2012-07-25 09:56 
/tmp/table1_export/distcp.txt
+    -rw-r--r--   3 user supergroup        821 2012-07-25 09:56 
/tmp/table1_export/exportMetadata.zip
+    $ hadoop fs -cat /tmp/table1_export/distcp.txt
+    hdfs://n1.example.com:6093/accumulo/tables/3/default_tablet/F0000000.rf
+    hdfs://n1.example.com:6093/tmp/table1_export/exportMetadata.zip
+
+Before the table can be imported, it must be copied using distcp. After the
+discp completed, the cloned table may be deleted.
+
+    $ hadoop distcp -f /tmp/table1_export/distcp.txt /tmp/table1_export_dest
+
+The Accumulo shell session below shows importing the table and inspecting it.
+The data, splits, config, and logical time information for the table were
+preserved.
+
+    root@test15> importtable table1_copy /tmp/table1_export_dest
+    root@test15> table table1_copy
+    root@test15 table1_copy> scan
+    a cf1:cq1 []    v1
+    h cf1:cq1 []    v2
+    z cf1:cq1 []    v3
+    z cf1:cq2 []    v4
+    root@test15 table1_copy> getsplits -t table1_copy
+    b
+    r
+    root@test15> config -t table1_copy -f split
+    
---------+--------------------------+-------------------------------------------
+    SCOPE    | NAME                     | VALUE
+    
---------+--------------------------+-------------------------------------------
+    default  | table.split.threshold .. | 1G
+    table    |    @override ........... | 100M
+    
---------+--------------------------+-------------------------------------------
+    root@test15> tables -l
+    accumulo.metadata    =>        !0
+    accumulo.root        =>        +r
+    table1_copy          =>         5
+    trace                =>         1
+    root@test15 table1_copy> scan -t accumulo.metadata -b 5 -c srv:time
+    5;b srv:time []    M1343224500467
+    5;r srv:time []    M1343224500467
+    5< srv:time []    M1343224500467
+
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/filedata.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/filedata.md 
b/docs/src/main/resources/examples/filedata.md
new file mode 100644
index 0000000..6de2f0a
--- /dev/null
+++ b/docs/src/main/resources/examples/filedata.md
@@ -0,0 +1,51 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo File System Archive Example (Data Only)
+---
+
+This example archives file data into an Accumulo table. Files with duplicate 
data are only stored once.
+The example has the following classes:
+
+ * CharacterHistogram - A MapReduce that computes a histogram of byte 
frequency for each file and stores the histogram alongside the file data. An 
example use of the ChunkInputFormat.
+ * ChunkCombiner - An Iterator that dedupes file data and sets their 
visibilities to a combined visibility based on current references to the file 
data.
+ * ChunkInputFormat - An Accumulo InputFormat that provides keys containing 
file info (List<Entry<Key,Value>>) and values with an InputStream over the file 
(ChunkInputStream).
+ * ChunkInputStream - An input stream over file data stored in Accumulo.
+ * FileDataIngest - Takes a list of files and archives them into Accumulo 
keyed on hashes of the files.
+ * FileDataQuery - Retrieves file data based on the hash of the file. (Used by 
the dirlist.Viewer.)
+ * KeyUtil - A utility for creating and parsing null-byte separated strings 
into/from Text objects.
+ * VisibilityCombiner - A utility for merging visibilities into the form 
(VIS1)|(VIS2)|...
+
+This example is coupled with the [dirlist example][dirlist].
+
+If you haven't already run the [dirlist example][dirlist], ingest a file with 
FileDataIngest.
+
+    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.filedata.FileDataIngest -i instance -z 
zookeepers -u username -p password -t dataTable --auths exampleVis --chunk 1000 
/path/to/accumulo/README.md
+
+Open the accumulo shell and look at the data. The row is the MD5 hash of the 
file, which you can verify by running a command such as 'md5sum' on the file.
+
+    > scan -t dataTable
+
+Run the CharacterHistogram MapReduce to add some information about the file.
+
+    $ ./contrib/tool.sh lib/accumulo-examples-simple.jar 
org.apache.accumulo.examples.simple.filedata.CharacterHistogram -i instance -z 
zookeepers -u username -p password -t dataTable --auths exampleVis --vis 
exampleVis
+
+Scan again to see the histogram stored in the 'info' column family.
+
+    > scan -t dataTable
+
+[dirlist]: dirlist.md
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/filter.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/filter.md 
b/docs/src/main/resources/examples/filter.md
new file mode 100644
index 0000000..563e247
--- /dev/null
+++ b/docs/src/main/resources/examples/filter.md
@@ -0,0 +1,112 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Filter Example
+---
+
+This is a simple filter example. It uses the AgeOffFilter that is provided as
+part of the core package org.apache.accumulo.core.iterators.user. Filters are
+iterators that select desired key/value pairs (or weed out undesired ones).
+Filters extend the org.apache.accumulo.core.iterators.Filter class
+and must implement a method accept(Key k, Value v). This method returns true
+if the key/value pair are to be delivered and false if they are to be ignored.
+Filter takes a "negate" parameter which defaults to false. If set to true, the
+return value of the accept method is negated, so that key/value pairs accepted
+by the method are omitted by the Filter.
+
+    username@instance> createtable filtertest
+    username@instance filtertest> setiter -t filtertest -scan -p 10 -n 
myfilter -ageoff
+    AgeOffFilter removes entries with timestamps more than <ttl> milliseconds 
old
+    ----------> set AgeOffFilter parameter negate, default false keeps k/v 
that pass accept method, true rejects k/v that pass accept method:
+    ----------> set AgeOffFilter parameter ttl, time to live (milliseconds): 
30000
+    ----------> set AgeOffFilter parameter currentTime, if set, use the given 
value as the absolute time in milliseconds as the current time of day:
+    username@instance filtertest> scan
+    username@instance filtertest> insert foo a b c
+    username@instance filtertest> scan
+    foo a:b []    c
+    username@instance filtertest>
+
+... wait 30 seconds ...
+
+    username@instance filtertest> scan
+    username@instance filtertest>
+
+Note the absence of the entry inserted more than 30 seconds ago. Since the
+scope was set to "scan", this means the entry is still in Accumulo, but is
+being filtered out at query time. To delete entries from Accumulo based on
+the ages of their timestamps, AgeOffFilters should be set up for the "minc"
+and "majc" scopes, as well.
+
+To force an ageoff of the persisted data, after setting up the ageoff iterator
+on the "minc" and "majc" scopes you can flush and compact your table. This will
+happen automatically as a background operation on any table that is being
+actively written to, but can also be requested in the shell.
+
+The first setiter command used the special -ageoff flag to specify the
+AgeOffFilter, but any Filter can be configured by using the -class flag. The
+following commands show how to enable the AgeOffFilter for the minc and majc
+scopes using the -class flag, then flush and compact the table.
+
+    username@instance filtertest> setiter -t filtertest -minc -majc -p 10 -n 
myfilter -class org.apache.accumulo.core.iterators.user.AgeOffFilter
+    AgeOffFilter removes entries with timestamps more than <ttl> milliseconds 
old
+    ----------> set AgeOffFilter parameter negate, default false keeps k/v 
that pass accept method, true rejects k/v that pass accept method:
+    ----------> set AgeOffFilter parameter ttl, time to live (milliseconds): 
30000
+    ----------> set AgeOffFilter parameter currentTime, if set, use the given 
value as the absolute time in milliseconds as the current time of day:
+    username@instance filtertest> flush
+    06 10:42:24,806 [shell.Shell] INFO : Flush of table filtertest initiated...
+    username@instance filtertest> compact
+    06 10:42:36,781 [shell.Shell] INFO : Compaction of table filtertest 
started for given range
+    username@instance filtertest> flush -t filtertest -w
+    06 10:42:52,881 [shell.Shell] INFO : Flush of table filtertest completed.
+    username@instance filtertest> compact -t filtertest -w
+    06 10:43:00,632 [shell.Shell] INFO : Compacting table ...
+    06 10:43:01,307 [shell.Shell] INFO : Compaction of table filtertest 
completed for given range
+    username@instance filtertest>
+
+By default, flush and compact execute in the background, but with the -w flag
+they will wait to return until the operation has completed. Both are
+demonstrated above, though only one call to each would be necessary. A
+specific table can be specified with -t.
+
+After the compaction runs, the newly created files will not contain any data
+that should have been aged off, and the Accumulo garbage collector will remove
+the old files.
+
+To see the iterator settings for a table, use config.
+
+    username@instance filtertest> config -t filtertest -f iterator
+    
---------+---------------------------------------------+---------------------------------------------------------------------------
+    SCOPE    | NAME                                        | VALUE
+    
---------+---------------------------------------------+---------------------------------------------------------------------------
+    table    | table.iterator.majc.myfilter .............. | 
10,org.apache.accumulo.core.iterators.user.AgeOffFilter
+    table    | table.iterator.majc.myfilter.opt.ttl ...... | 30000
+    table    | table.iterator.majc.vers .................. | 
20,org.apache.accumulo.core.iterators.user.VersioningIterator
+    table    | table.iterator.majc.vers.opt.maxVersions .. | 1
+    table    | table.iterator.minc.myfilter .............. | 
10,org.apache.accumulo.core.iterators.user.AgeOffFilter
+    table    | table.iterator.minc.myfilter.opt.ttl ...... | 30000
+    table    | table.iterator.minc.vers .................. | 
20,org.apache.accumulo.core.iterators.user.VersioningIterator
+    table    | table.iterator.minc.vers.opt.maxVersions .. | 1
+    table    | table.iterator.scan.myfilter .............. | 
10,org.apache.accumulo.core.iterators.user.AgeOffFilter
+    table    | table.iterator.scan.myfilter.opt.ttl ...... | 30000
+    table    | table.iterator.scan.vers .................. | 
20,org.apache.accumulo.core.iterators.user.VersioningIterator
+    table    | table.iterator.scan.vers.opt.maxVersions .. | 1
+    
---------+---------------------------------------------+---------------------------------------------------------------------------
+    username@instance filtertest>
+
+When setting new iterators, make sure to order their priority numbers
+(specified with -p) in the order you would like the iterators to be applied.
+Also, each iterator must have a unique name and priority within each scope.

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/helloworld.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/helloworld.md 
b/docs/src/main/resources/examples/helloworld.md
new file mode 100644
index 0000000..bc9f04b
--- /dev/null
+++ b/docs/src/main/resources/examples/helloworld.md
@@ -0,0 +1,49 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Hello World Example
+---
+
+This tutorial uses the following Java classes, which can be found in 
org.apache.accumulo.examples.simple.helloworld in the examples-simple module:
+
+ * InsertWithBatchWriter.java - Inserts 10K rows (50K entries) into accumulo 
with each row having 5 entries
+ * ReadData.java - Reads all data between two rows
+
+Log into the accumulo shell:
+
+    $ ./bin/accumulo shell -u username -p password
+
+Create a table called 'hellotable':
+
+    username@instance> createtable hellotable
+
+Launch a Java program that inserts data with a BatchWriter:
+
+    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter -i 
instance -z zookeepers -u username -p password -t hellotable
+
+On the accumulo status page at the URL below (where 'master' is replaced with 
the name or IP of your accumulo master), you should see 50K entries
+
+    http://master:9995/
+
+To view the entries, use the shell to scan the table:
+
+    username@instance> table hellotable
+    username@instance hellotable> scan
+
+You can also use a Java class to scan the table:
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData 
-i instance -z zookeepers -u username -p password -t hellotable --startKey 
row_0 --endKey row_1001

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/index.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/index.md 
b/docs/src/main/resources/examples/index.md
new file mode 100644
index 0000000..efb55f6
--- /dev/null
+++ b/docs/src/main/resources/examples/index.md
@@ -0,0 +1,100 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Examples
+---
+
+## Setup instructions
+
+Before running any of the examples, the following steps must be performed.
+
+1. Install and run Accumulo via the instructions found in INSTALL.md.
+   Remember the instance name. It will be referred to as "instance" throughout
+   the examples. A comma-separated list of zookeeper servers will be referred
+   to as "zookeepers".
+
+2. Create an Accumulo user (for help see the 'User Administration' section of 
the 
+   [user manual][manual]), or use the root user. This user and their password
+   should replace any reference to "username" or "password" in the examples. 
This
+   user needs the ability to create tables.
+
+In all commands, you will need to replace "instance", "zookeepers",
+"username", and "password" with the values you set for your Accumulo instance.
+
+Commands intended to be run in bash are prefixed by '$'. These are always
+assumed to be run the from the root of your Accumulo installation.
+
+Commands intended to be run in the Accumulo shell are prefixed by '>'.
+
+## Accumulo Examples
+
+Each example below highlights a feature of Apache Accumulo.
+
+| Accumulo Example | Description |
+|------------------|-------------|
+| [batch] | Using the batch writer and batch scanner |
+| [bloom] | Creating a bloom filter enabled table to increase query 
performance |
+| [bulkIngest] | Ingesting bulk data using map/reduce jobs on Hadoop |
+| [classpath] | Using per-table classpaths |
+| [client] | Using table operations, reading and writing data in Java. |
+| [combiner] | Using example StatsCombiner to find min, max, sum, and count. |
+| [compactionStrategy] | Configuring a compaction strategy |
+| [constraints] | Using constraints with tables. |
+| [dirlist] | Storing filesystem information. |
+| [export] | Exporting and importing tables. |
+| [filedata] | Storing file data. |
+| [filter] | Using the AgeOffFilter to remove records more than 30 seconds 
old. |
+| [helloworld] | Inserting records both inside map/reduce jobs and outside. 
And reading records between two rows. |
+| [isolation] | Using the isolated scanner to ensure partial changes are not 
seen. |
+| [mapred] | Using MapReduce to read from and write to Accumulo tables. |
+| [maxmutation] | Limiting mutation size to avoid running out of memory. |
+| [regex] | Using MapReduce and Accumulo to find data using regular 
expressions. |
+| [reservations] | Using conditional mutations to implement simple reservation 
system. |
+| [rgbalancer] | Using a balancer to spread groups of tablets within a table 
evenly |
+| [rowhash] | Using MapReduce to read a table and write to a new column in the 
same table. |
+| [sample] | Building and using sample data in Accumulo. |
+| [shard] | Using the intersecting iterator with a term index partitioned by 
document. |
+| [tabletofile] | Using MapReduce to read a table and write one of its columns 
to a file in HDFS. |
+| [terasort] | Generating random data and sorting it using Accumulo. |
+| [visibility] | Using visibilities (or combinations of authorizations). Also 
shows user permissions. |
+
+[manual]: https://accumulo.apache.org/latest/accumulo_user_manual/
+[batch]: batch.md
+[bloom]: bloom.md
+[bulkIngest]: bulkIngest.md
+[classpath]: classpath.md
+[client]: client.md 
+[combiner]: combiner.md
+[compactionStrategy]: compactionStrategy.md
+[constraints]: constraints.md
+[dirlist]: dirlist.md
+[export]: export.md
+[filedata]: filedata.md
+[filter]: filter.md
+[helloworld]: helloworld.md
+[isolation]: isolation.md
+[mapred]: mapred.md
+[maxmutation]: maxmutation.md
+[regex]: regex.md
+[reservations]: reservations.md
+[rgbalancer]: rgbalancer.md
+[rowhash]: rowhash.md
+[sample]: sample.md
+[shard]: shard.md
+[tabletofile]: tabletofile.md
+[terasort]: terasort.md
+[visibility]: visibility.md

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/isolation.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/isolation.md 
b/docs/src/main/resources/examples/isolation.md
new file mode 100644
index 0000000..9b4e0af
--- /dev/null
+++ b/docs/src/main/resources/examples/isolation.md
@@ -0,0 +1,51 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Isolation Example
+---
+
+Accumulo has an isolated scanner that ensures partial changes to rows are not
+seen. Isolation is documented in ../docs/isolation.html and the user manual.
+
+InterferenceTest is a simple example that shows the effects of scanning with
+and without isolation. This program starts two threads. One threads
+continually upates all of the values in a row to be the same thing, but
+different from what it used to be. The other thread continually scans the
+table and checks that all values in a row are the same. Without isolation the
+scanning thread will sometimes see different values, which is the result of
+reading the row at the same time a mutation is changing the row.
+
+Below, Interference Test is run without isolation enabled for 5000 iterations
+and it reports problems.
+
+    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.isolation.InterferenceTest -i instance -z 
zookeepers -u username -p password -t isotest --iterations 5000
+    ERROR Columns in row 053 had multiple values [53, 4553]
+    ERROR Columns in row 061 had multiple values [561, 61]
+    ERROR Columns in row 070 had multiple values [570, 1070]
+    ERROR Columns in row 079 had multiple values [1079, 1579]
+    ERROR Columns in row 088 had multiple values [2588, 1588]
+    ERROR Columns in row 106 had multiple values [2606, 3106]
+    ERROR Columns in row 115 had multiple values [4615, 3115]
+    finished
+
+Below, Interference Test is run with isolation enabled for 5000 iterations and
+it reports no problems.
+
+    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.isolation.InterferenceTest -i instance -z 
zookeepers -u username -p password -t isotest --iterations 5000 --isolated
+    finished
+
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/mapred.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/mapred.md 
b/docs/src/main/resources/examples/mapred.md
new file mode 100644
index 0000000..e1a49eb
--- /dev/null
+++ b/docs/src/main/resources/examples/mapred.md
@@ -0,0 +1,156 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo MapReduce Example
+---
+
+This example uses mapreduce and accumulo to compute word counts for a set of
+documents. This is accomplished using a map-only mapreduce job and a
+accumulo table with combiners.
+
+To run this example you will need a directory in HDFS containing text files.
+The accumulo readme will be used to show how to run this example.
+
+    $ hadoop fs -copyFromLocal /path/to/accumulo/README.md 
/user/username/wc/Accumulo.README
+    $ hadoop fs -ls /user/username/wc
+    Found 1 items
+    -rw-r--r--   2 username supergroup       9359 2009-07-15 17:54 
/user/username/wc/Accumulo.README
+
+The first part of running this example is to create a table with a combiner
+for the column family count.
+
+    $ ./bin/accumulo shell -u username -p password
+    Shell - Apache Accumulo Interactive Shell
+    - version: 1.5.0
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    -
+    - type 'help' for a list of available commands
+    -
+    username@instance> createtable wordCount
+    username@instance wordCount> setiter -class 
org.apache.accumulo.core.iterators.user.SummingCombiner -p 10 -t wordCount 
-majc -minc -scan
+    SummingCombiner interprets Values as Longs and adds them together. A 
variety of encodings (variable length, fixed length, or string) are available
+    ----------> set SummingCombiner parameter all, set to true to apply 
Combiner to every column, otherwise leave blank. if true, columns option will 
be ignored.: false
+    ----------> set SummingCombiner parameter columns, <col fam>[:<col 
qual>]{,<col fam>[:<col qual>]} escape non-alphanum chars using %<hex>.: count
+    ----------> set SummingCombiner parameter lossy, if true, failed decodes 
are ignored. Otherwise combiner will error on failed decodes (default false): 
<TRUE|FALSE>: false
+    ----------> set SummingCombiner parameter type, 
<VARLEN|FIXEDLEN|STRING|fullClassName>: STRING
+    username@instance wordCount> quit
+
+After creating the table, run the word count map reduce job.
+
+    $ ./contrib/tool.sh lib/accumulo-examples-simple.jar 
org.apache.accumulo.examples.simple.mapreduce.WordCount -i instance -z 
zookeepers  --input /user/username/wc -t wordCount -u username -p password
+
+    11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to process 
: 1
+    11/02/07 18:20:12 INFO mapred.JobClient: Running job: job_201102071740_0003
+    11/02/07 18:20:13 INFO mapred.JobClient:  map 0% reduce 0%
+    11/02/07 18:20:20 INFO mapred.JobClient:  map 100% reduce 0%
+    11/02/07 18:20:22 INFO mapred.JobClient: Job complete: 
job_201102071740_0003
+    11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6
+    11/02/07 18:20:22 INFO mapred.JobClient:   Job Counters
+    11/02/07 18:20:22 INFO mapred.JobClient:     Launched map tasks=1
+    11/02/07 18:20:22 INFO mapred.JobClient:     Data-local map tasks=1
+    11/02/07 18:20:22 INFO mapred.JobClient:   FileSystemCounters
+    11/02/07 18:20:22 INFO mapred.JobClient:     HDFS_BYTES_READ=10487
+    11/02/07 18:20:22 INFO mapred.JobClient:   Map-Reduce Framework
+    11/02/07 18:20:22 INFO mapred.JobClient:     Map input records=255
+    11/02/07 18:20:22 INFO mapred.JobClient:     Spilled Records=0
+    11/02/07 18:20:22 INFO mapred.JobClient:     Map output records=1452
+
+After the map reduce job completes, query the accumulo table to see word
+counts.
+
+    $ ./bin/accumulo shell -u username -p password
+    username@instance> table wordCount
+    username@instance wordCount> scan -b the
+    the count:20080906 []    75
+    their count:20080906 []    2
+    them count:20080906 []    1
+    then count:20080906 []    1
+    there count:20080906 []    1
+    these count:20080906 []    3
+    this count:20080906 []    6
+    through count:20080906 []    1
+    time count:20080906 []    3
+    time. count:20080906 []    1
+    to count:20080906 []    27
+    total count:20080906 []    1
+    tserver, count:20080906 []    1
+    tserver.compaction.major.concurrent.max count:20080906 []    1
+    ...
+
+Another example to look at is
+org.apache.accumulo.examples.simple.mapreduce.UniqueColumns. This example
+computes the unique set of columns in a table and shows how a map reduce job
+can directly read a tables files from HDFS.
+
+One more example available is
+org.apache.accumulo.examples.simple.mapreduce.TokenFileWordCount.
+The TokenFileWordCount example works exactly the same as the WordCount example
+explained above except that it uses a token file rather than giving the
+password directly to the map-reduce job (this avoids having the password
+displayed in the job's configuration which is world-readable).
+
+To create a token file, use the create-token utility
+
+  $ ./bin/accumulo create-token
+
+It defaults to creating a PasswordToken, but you can specify the token class
+with -tc (requires the fully qualified class name). Based on the token class,
+it will prompt you for each property required to create the token.
+
+The last value it prompts for is a local filename to save to. If this file
+exists, it will append the new token to the end. Multiple tokens can exist in
+a file, but only the first one for each user will be recognized.
+
+Rather than waiting for the prompts, you can specify some options when calling
+create-token, for example
+
+  $ ./bin/accumulo create-token -u root -p secret -f root.pw
+
+would create a token file containing a PasswordToken for
+user 'root' with password 'secret' and saved to 'root.pw'
+
+This local file needs to be uploaded to hdfs to be used with the
+map-reduce job. For example, if the file were 'root.pw' in the local directory:
+
+  $ hadoop fs -put root.pw root.pw
+
+This would put 'root.pw' in the user's home directory in hdfs.
+
+Because the basic WordCount example uses Opts to parse its arguments
+(which extends ClientOnRequiredTable), you can use a token file with
+the basic WordCount example by calling the same command as explained above
+except replacing the password with the token file (rather than -p, use -tf).
+
+  $ ./contrib/tool.sh lib/accumulo-examples-simple.jar 
org.apache.accumulo.examples.simple.mapreduce.WordCount -i instance -z 
zookeepers  --input /user/username/wc -t wordCount -u username -tf tokenfile
+
+In the above examples, username was 'root' and tokenfile was 'root.pw'
+
+However, if you don't want to use the Opts class to parse arguments,
+the TokenFileWordCount is an example of using the token file manually.
+
+  $ ./contrib/tool.sh lib/accumulo-examples-simple.jar 
org.apache.accumulo.examples.simple.mapreduce.TokenFileWordCount instance 
zookeepers username tokenfile /user/username/wc wordCount
+
+The results should be the same as the WordCount example except that the
+authentication token was not stored in the configuration. It was instead
+stored in a file that the map-reduce job pulled into the distributed cache.
+(If you ran either of these on the same table right after the
+WordCount example, then the resulting counts should just double.)
+
+
+
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/maxmutation.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/maxmutation.md 
b/docs/src/main/resources/examples/maxmutation.md
new file mode 100644
index 0000000..48c918a
--- /dev/null
+++ b/docs/src/main/resources/examples/maxmutation.md
@@ -0,0 +1,51 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo MaxMutation Constraints Example
+---
+
+This an example of how to limit the size of mutations that will be accepted 
into
+a table. Under the default configuration, accumulo does not provide a 
limitation
+on the size of mutations that can be ingested. Poorly behaved writers might
+inadvertently create mutations so large, that they cause the tablet servers to
+run out of memory. A simple contraint can be added to a table to reject very
+large mutations.
+
+    $ ./bin/accumulo shell -u username -p password
+
+    Shell - Apache Accumulo Interactive Shell
+    -
+    - version: 1.5.0
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    -
+    - type 'help' for a list of available commands
+    -
+    username@instance> createtable test_ingest
+    username@instance test_ingest> config -t test_ingest -s 
table.constraint.1=org.apache.accumulo.examples.simple.constraints.MaxMutationSize
+    username@instance test_ingest>
+
+
+Now the table will reject any mutation that is larger than 1/256th of the 
+working memory of the tablet server.  The following command attempts to ingest 
+a single row with 10000 columns, which exceeds the memory limit. Depending on 
the
+amount of Java heap your tserver(s) are given, you may have to increase the 
number
+of columns provided to see the failure.
+
+    $ ./bin/accumulo org.apache.accumulo.test.TestIngest -i instance -z 
zookeepers -u username -p password --rows 1 --cols 10000 
+    ERROR : Constraint violates : 
ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.MaxMutationSize,
 violationCode:0, violationDescription:mutation exceeded maximum size of 
188160, numberOfViolatingMutations:1)
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/regex.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/regex.md 
b/docs/src/main/resources/examples/regex.md
new file mode 100644
index 0000000..29d47e1
--- /dev/null
+++ b/docs/src/main/resources/examples/regex.md
@@ -0,0 +1,59 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Regex Example
+---
+
+This example uses mapreduce and accumulo to find items using regular 
expressions.
+This is accomplished using a map-only mapreduce job and a scan-time iterator.
+
+To run this example you will need some data in a table. The following will
+put a trivial amount of data into accumulo using the accumulo shell:
+
+    $ ./bin/accumulo shell -u username -p password
+    Shell - Apache Accumulo Interactive Shell
+    - version: 1.5.0
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    -
+    - type 'help' for a list of available commands
+    -
+    username@instance> createtable input
+    username@instance> insert dogrow dogcf dogcq dogvalue
+    username@instance> insert catrow catcf catcq catvalue
+    username@instance> quit
+
+The RegexExample class sets an iterator on the scanner. This does pattern 
matching
+against each key/value in accumulo, and only returns matching items. It will 
do this
+in parallel and will store the results in files in hdfs.
+
+The following will search for any rows in the input table that starts with 
"dog":
+
+    $ ./contrib/tool.sh lib/accumulo-examples-simple.jar 
org.apache.accumulo.examples.simple.mapreduce.RegexExample -u user -p passwd -i 
instance -t input --rowRegex 'dog.*' --output /tmp/output
+
+    $ hadoop fs -ls /tmp/output
+    Found 3 items
+    -rw-r--r--   1 username supergroup          0 2013-01-10 14:11 
/tmp/output/_SUCCESS
+    drwxr-xr-x   - username supergroup          0 2013-01-10 14:10 
/tmp/output/_logs
+    -rw-r--r--   1 username supergroup         51 2013-01-10 14:10 
/tmp/output/part-m-00000
+
+We can see the output of our little map-reduce job:
+
+    $ hadoop fs -text /tmp/output/part-m-00000
+    dogrow dogcf:dogcq [] 1357844987994 false  dogvalue
+
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/reservations.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/reservations.md 
b/docs/src/main/resources/examples/reservations.md
new file mode 100644
index 0000000..6b4886c
--- /dev/null
+++ b/docs/src/main/resources/examples/reservations.md
@@ -0,0 +1,68 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Reservations Example
+---
+
+This example shows running a simple reservation system implemented using
+conditional mutations. This system guarantees that only one concurrent user can
+reserve a resource. The example's reserve command allows multiple users to be
+specified. When this is done, it creates a separate reservation thread for each
+user. In the example below threads are spun up for alice, bob, eve, mallory,
+and trent to reserve room06 on 20140101. Bob ends up getting the reservation
+and everyone else is put on a wait list. The example code will take any string
+for what, when and who.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.reservations.ARS
+    >connect test16 localhost root secret ars
+      connected
+    >
+      Commands :
+        reserve <what> <when> <who> {who}
+        cancel <what> <when> <who>
+        list <what> <when>
+    >reserve room06 20140101 alice bob eve mallory trent
+                       bob : RESERVED
+                   mallory : WAIT_LISTED
+                     alice : WAIT_LISTED
+                     trent : WAIT_LISTED
+                       eve : WAIT_LISTED
+    >list room06 20140101
+      Reservation holder : bob
+      Wait list : [mallory, alice, trent, eve]
+    >cancel room06 20140101 alice
+    >cancel room06 20140101 bob
+    >list room06 20140101
+      Reservation holder : mallory
+      Wait list : [trent, eve]
+    >quit
+
+Scanning the table in the Accumulo shell after running the example shows the
+following:
+
+    root@test16> table ars
+    root@test16 ars> scan
+    room06:20140101 res:0001 []    mallory
+    room06:20140101 res:0003 []    trent
+    room06:20140101 res:0004 []    eve
+    room06:20140101 tx:seq []    6
+
+The tx:seq column is incremented for each update to the row allowing for
+detection of concurrent changes. For an update to go through, the sequence
+number must not have changed since the data was read. If it does change,
+the conditional mutation will fail and the example code will retry.
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/rgbalancer.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/rgbalancer.md 
b/docs/src/main/resources/examples/rgbalancer.md
new file mode 100644
index 0000000..3c80861
--- /dev/null
+++ b/docs/src/main/resources/examples/rgbalancer.md
@@ -0,0 +1,161 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Balancer Example
+---
+
+For some data access patterns, its important to spread groups of tablets within
+a table out evenly.  Accumulo has a balancer that can do this using a regular
+expression to group tablets. This example shows how this balancer spreads 4
+groups of tablets within a table evenly across 17 tablet servers.
+
+Below shows creating a table and adding splits.  For this example we would like
+all of the tablets where the split point has the same two digits to be on
+different tservers.  This gives us four groups of tablets: 01, 02, 03, and 04. 
  
+
+    root@accumulo> createtable testRGB
+    root@accumulo testRGB> addsplits -t testRGB 01b 01m 01r 01z  02b 02m 02r 
02z 03b 03m 03r 03z 04a 04b 04c 04d 04e 04f 04g 04h 04i 04j 04k 04l 04m 04n 04o 
04p
+    root@accumulo testRGB> tables -l
+    accumulo.metadata    =>        !0
+    accumulo.replication =>      +rep
+    accumulo.root        =>        +r
+    testRGB              =>         2
+    trace                =>         1
+
+After adding the splits we look at the locations in the metadata table.
+
+    root@accumulo testRGB> scan -t accumulo.metadata -b 2; -e 2< -c loc
+    2;01b loc:34a5f6e086b000c []    ip-10-1-2-25:9997
+    2;01m loc:34a5f6e086b000c []    ip-10-1-2-25:9997
+    2;01r loc:14a5f6e079d0011 []    ip-10-1-2-15:9997
+    2;01z loc:14a5f6e079d000f []    ip-10-1-2-13:9997
+    2;02b loc:34a5f6e086b000b []    ip-10-1-2-26:9997
+    2;02m loc:14a5f6e079d000c []    ip-10-1-2-28:9997
+    2;02r loc:14a5f6e079d0012 []    ip-10-1-2-27:9997
+    2;02z loc:14a5f6e079d0012 []    ip-10-1-2-27:9997
+    2;03b loc:14a5f6e079d000d []    ip-10-1-2-21:9997
+    2;03m loc:14a5f6e079d000e []    ip-10-1-2-20:9997
+    2;03r loc:14a5f6e079d000d []    ip-10-1-2-21:9997
+    2;03z loc:14a5f6e079d000e []    ip-10-1-2-20:9997
+    2;04a loc:34a5f6e086b000b []    ip-10-1-2-26:9997
+    2;04b loc:14a5f6e079d0010 []    ip-10-1-2-17:9997
+    2;04c loc:14a5f6e079d0010 []    ip-10-1-2-17:9997
+    2;04d loc:24a5f6e07d3000c []    ip-10-1-2-16:9997
+    2;04e loc:24a5f6e07d3000d []    ip-10-1-2-29:9997
+    2;04f loc:24a5f6e07d3000c []    ip-10-1-2-16:9997
+    2;04g loc:24a5f6e07d3000a []    ip-10-1-2-14:9997
+    2;04h loc:14a5f6e079d000c []    ip-10-1-2-28:9997
+    2;04i loc:34a5f6e086b000d []    ip-10-1-2-19:9997
+    2;04j loc:34a5f6e086b000d []    ip-10-1-2-19:9997
+    2;04k loc:24a5f6e07d30009 []    ip-10-1-2-23:9997
+    2;04l loc:24a5f6e07d3000b []    ip-10-1-2-22:9997
+    2;04m loc:24a5f6e07d30009 []    ip-10-1-2-23:9997
+    2;04n loc:24a5f6e07d3000b []    ip-10-1-2-22:9997
+    2;04o loc:34a5f6e086b000a []    ip-10-1-2-18:9997
+    2;04p loc:24a5f6e07d30008 []    ip-10-1-2-24:9997
+    2< loc:24a5f6e07d30008 []    ip-10-1-2-24:9997
+
+Below the information above was massaged to show which tablet groups are on
+each tserver.  The four tablets in group 03 are on two tservers, ideally those
+tablets would be spread across 4 tservers.  Note the default tablet (2<) was
+categorized as group 04 below.
+
+    ip-10-1-2-13:9997 01
+    ip-10-1-2-14:9997 04
+    ip-10-1-2-15:9997 01
+    ip-10-1-2-16:9997 04 04
+    ip-10-1-2-17:9997 04 04
+    ip-10-1-2-18:9997 04
+    ip-10-1-2-19:9997 04 04
+    ip-10-1-2-20:9997 03 03
+    ip-10-1-2-21:9997 03 03
+    ip-10-1-2-22:9997 04 04
+    ip-10-1-2-23:9997 04 04
+    ip-10-1-2-24:9997 04 04
+    ip-10-1-2-25:9997 01 01
+    ip-10-1-2-26:9997 02 04
+    ip-10-1-2-27:9997 02 02
+    ip-10-1-2-28:9997 02 04
+    ip-10-1-2-29:9997 04
+
+To remedy this situation, the RegexGroupBalancer is configured with the
+commands below.  The configured regular expression selects the first two digits
+from a tablets end row as the group id.  Tablets that don't match and the
+default tablet are configured to be in group 04.
+
+    root@accumulo testRGB> config -t testRGB -s 
table.custom.balancer.group.regex.pattern=(\\d\\d).*
+    root@accumulo testRGB> config -t testRGB -s 
table.custom.balancer.group.regex.default=04
+    root@accumulo testRGB> config -t testRGB -s 
table.balancer=org.apache.accumulo.server.master.balancer.RegexGroupBalancer
+
+After waiting a little bit, look at the tablet locations again and all is good.
+
+    root@accumulo testRGB> scan -t accumulo.metadata -b 2; -e 2< -c loc
+    2;01b loc:34a5f6e086b000a []    ip-10-1-2-18:9997
+    2;01m loc:34a5f6e086b000c []    ip-10-1-2-25:9997
+    2;01r loc:14a5f6e079d0011 []    ip-10-1-2-15:9997
+    2;01z loc:14a5f6e079d000f []    ip-10-1-2-13:9997
+    2;02b loc:34a5f6e086b000b []    ip-10-1-2-26:9997
+    2;02m loc:14a5f6e079d000c []    ip-10-1-2-28:9997
+    2;02r loc:34a5f6e086b000d []    ip-10-1-2-19:9997
+    2;02z loc:14a5f6e079d0012 []    ip-10-1-2-27:9997
+    2;03b loc:24a5f6e07d3000d []    ip-10-1-2-29:9997
+    2;03m loc:24a5f6e07d30009 []    ip-10-1-2-23:9997
+    2;03r loc:14a5f6e079d000d []    ip-10-1-2-21:9997
+    2;03z loc:14a5f6e079d000e []    ip-10-1-2-20:9997
+    2;04a loc:34a5f6e086b000b []    ip-10-1-2-26:9997
+    2;04b loc:34a5f6e086b000c []    ip-10-1-2-25:9997
+    2;04c loc:14a5f6e079d0010 []    ip-10-1-2-17:9997
+    2;04d loc:14a5f6e079d000e []    ip-10-1-2-20:9997
+    2;04e loc:24a5f6e07d3000d []    ip-10-1-2-29:9997
+    2;04f loc:24a5f6e07d3000c []    ip-10-1-2-16:9997
+    2;04g loc:24a5f6e07d3000a []    ip-10-1-2-14:9997
+    2;04h loc:14a5f6e079d000c []    ip-10-1-2-28:9997
+    2;04i loc:14a5f6e079d0011 []    ip-10-1-2-15:9997
+    2;04j loc:34a5f6e086b000d []    ip-10-1-2-19:9997
+    2;04k loc:14a5f6e079d0012 []    ip-10-1-2-27:9997
+    2;04l loc:14a5f6e079d000f []    ip-10-1-2-13:9997
+    2;04m loc:24a5f6e07d30009 []    ip-10-1-2-23:9997
+    2;04n loc:24a5f6e07d3000b []    ip-10-1-2-22:9997
+    2;04o loc:34a5f6e086b000a []    ip-10-1-2-18:9997
+    2;04p loc:14a5f6e079d000d []    ip-10-1-2-21:9997
+    2< loc:24a5f6e07d30008 []    ip-10-1-2-24:9997
+
+Once again, the data above is transformed to make it easier to see which groups
+are on tservers.  The transformed data below shows that all groups are now
+evenly spread.
+
+    ip-10-1-2-13:9997 01 04
+    ip-10-1-2-14:9997    04
+    ip-10-1-2-15:9997 01 04
+    ip-10-1-2-16:9997    04
+    ip-10-1-2-17:9997    04
+    ip-10-1-2-18:9997 01 04
+    ip-10-1-2-19:9997 02 04
+    ip-10-1-2-20:9997 03 04
+    ip-10-1-2-21:9997 03 04
+    ip-10-1-2-22:9997    04
+    ip-10-1-2-23:9997 03 04
+    ip-10-1-2-24:9997    04
+    ip-10-1-2-25:9997 01 04
+    ip-10-1-2-26:9997 02 04
+    ip-10-1-2-27:9997 02 04
+    ip-10-1-2-28:9997 02 04
+    ip-10-1-2-29:9997 03 04
+
+If you need this functionality, but a regular expression does not meet your
+needs then extend GroupBalancer.  This allows you to specify a partitioning
+function in Java.  Use the RegexGroupBalancer source as an example.

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/rowhash.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/rowhash.md 
b/docs/src/main/resources/examples/rowhash.md
new file mode 100644
index 0000000..9cd71a7
--- /dev/null
+++ b/docs/src/main/resources/examples/rowhash.md
@@ -0,0 +1,61 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo RowHash Example
+---
+
+This example shows a simple map/reduce job that reads from an accumulo table 
and
+writes back into that table.
+
+To run this example you will need some data in a table. The following will
+put a trivial amount of data into accumulo using the accumulo shell:
+
+    $ ./bin/accumulo shell -u username -p password
+    Shell - Apache Accumulo Interactive Shell
+    - version: 1.5.0
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    -
+    - type 'help' for a list of available commands
+    -
+    username@instance> createtable input
+    username@instance> insert a-row cf cq value
+    username@instance> insert b-row cf cq value
+    username@instance> quit
+
+The RowHash class will insert a hash for each row in the database if it 
contains a
+specified colum. Here's how you run the map/reduce job
+
+    $ ./contrib/tool.sh lib/accumulo-examples-simple.jar 
org.apache.accumulo.examples.simple.mapreduce.RowHash -u user -p passwd -i 
instance -t input --column cf:cq
+
+Now we can scan the table and see the hashes:
+
+    $ ./bin/accumulo shell -u username -p password
+    Shell - Apache Accumulo Interactive Shell
+    - version: 1.5.0
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    -
+    - type 'help' for a list of available commands
+    -
+    username@instance> scan -t input
+    a-row cf:cq []    value
+    a-row cf-HASHTYPE:cq-MD5BASE64 []    IGPBYI1uC6+AJJxC4r5YBA==
+    b-row cf:cq []    value
+    b-row cf-HASHTYPE:cq-MD5BASE64 []    IGPBYI1uC6+AJJxC4r5YBA==
+    username@instance>
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/52d526b9/docs/src/main/resources/examples/sample.md
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/sample.md 
b/docs/src/main/resources/examples/sample.md
new file mode 100644
index 0000000..432067e
--- /dev/null
+++ b/docs/src/main/resources/examples/sample.md
@@ -0,0 +1,193 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+---
+title: Apache Accumulo Sampling Example
+---
+
+Basic Sampling Example
+----------------------
+
+Accumulo supports building a set of sample data that can be efficiently
+accessed by scanners.  What data is included in the sample set is configurable.
+Below, some data representing documents are inserted.  
+
+    root@instance sampex> createtable sampex
+    root@instance sampex> insert 9255 doc content 'abcde'
+    root@instance sampex> insert 9255 doc url file://foo.txt
+    root@instance sampex> insert 8934 doc content 'accumulo scales'
+    root@instance sampex> insert 8934 doc url file://accumulo_notes.txt
+    root@instance sampex> insert 2317 doc content 'milk, eggs, bread, 
parmigiano-reggiano'
+    root@instance sampex> insert 2317 doc url file://groceries/9.txt
+    root@instance sampex> insert 3900 doc content 'EC2 ate my homework'
+    root@instance sampex> insert 3900 doc uril file://final_project.txt
+
+Below the table sampex is configured to build a sample set.  The configuration
+causes Accumulo to include any row where `murmur3_32(row) % 3 ==0` in the
+tables sample data.
+
+    root@instance sampex> config -t sampex -s 
table.sampler.opt.hasher=murmur3_32
+    root@instance sampex> config -t sampex -s table.sampler.opt.modulus=3
+    root@instance sampex> config -t sampex -s 
table.sampler=org.apache.accumulo.core.client.sample.RowSampler
+
+Below, attempting to scan the sample returns an error.  This is because data
+was inserted before the sample set was configured.
+
+    root@instance sampex> scan --sample
+    2015-09-09 12:21:50,643 [shell.Shell] ERROR: 
org.apache.accumulo.core.client.SampleNotPresentException: Table sampex(ID:2) 
does not have sampling configured or built
+
+To remedy this problem, the following command will flush in memory data and
+compact any files that do not contain the correct sample data.   
+
+    root@instance sampex> compact -t sampex --sf-no-sample
+
+After the compaction, the sample scan works.  
+
+    root@instance sampex> scan --sample
+    2317 doc:content []    milk, eggs, bread, parmigiano-reggiano
+    2317 doc:url []    file://groceries/9.txt
+
+The commands below show that updates to data in the sample are seen when
+scanning the sample.
+
+    root@instance sampex> insert 2317 doc content 'milk, eggs, bread, 
parmigiano-reggiano, butter'
+    root@instance sampex> scan --sample
+    2317 doc:content []    milk, eggs, bread, parmigiano-reggiano, butter
+    2317 doc:url []    file://groceries/9.txt
+
+Inorder to make scanning the sample fast, sample data is partitioned as data is
+written to Accumulo.  This means if the sample configuration is changed, that
+data written previously is partitioned using a different criteria.  Accumulo
+will detect this situation and fail sample scans.  The commands below show this
+failure and fixiing the problem with a compaction.
+
+    root@instance sampex> config -t sampex -s table.sampler.opt.modulus=2
+    root@instance sampex> scan --sample
+    2015-09-09 12:22:51,058 [shell.Shell] ERROR: 
org.apache.accumulo.core.client.SampleNotPresentException: Table sampex(ID:2) 
does not have sampling configured or built
+    root@instance sampex> compact -t sampex --sf-no-sample
+    2015-09-09 12:23:07,242 [shell.Shell] INFO : Compaction of table sampex 
started for given range
+    root@instance sampex> scan --sample
+    2317 doc:content []    milk, eggs, bread, parmigiano-reggiano
+    2317 doc:url []    file://groceries/9.txt
+    3900 doc:content []    EC2 ate my homework
+    3900 doc:uril []    file://final_project.txt
+    9255 doc:content []    abcde
+    9255 doc:url []    file://foo.txt
+
+The example above is replicated in a java program using the Accumulo API.
+Below is the program name and the command to run it.
+
+    ./bin/accumulo org.apache.accumulo.examples.simple.sample.SampleExample -i 
instance -z localhost -u root -p secret
+
+The commands below look under the hood to give some insight into how this
+feature works.  The commands determine what files the sampex table is using.
+
+    root@instance sampex> tables -l
+    accumulo.metadata    =>        !0
+    accumulo.replication =>      +rep
+    accumulo.root        =>        +r
+    sampex               =>         2
+    trace                =>         1
+    root@instance sampex> scan -t accumulo.metadata -c file -b 2 -e 2<
+    2< 
file:hdfs://localhost:10000/accumulo/tables/2/default_tablet/A000000s.rf []    
702,8
+
+Below shows running `accumulo rfile-info` on the file above.  This shows the
+rfile has a normal default locality group and a sample default locality group.
+The output also shows the configuration used to create the sample locality
+group.  The sample configuration within a rfile must match the tables sample
+configuration for sample scan to work.
+
+    $ ./bin/accumulo rfile-info 
hdfs://localhost:10000/accumulo/tables/2/default_tablet/A000000s.rf
+    Reading file: 
hdfs://localhost:10000/accumulo/tables/2/default_tablet/A000000s.rf
+    RFile Version            : 8
+    
+    Locality group           : <DEFAULT>
+       Start block            : 0
+       Num   blocks           : 1
+       Index level 0          : 35 bytes  1 blocks
+       First key              : 2317 doc:content [] 1437672014986 false
+       Last key               : 9255 doc:url [] 1437672014875 false
+       Num entries            : 8
+       Column families        : [doc]
+    
+    Sample Configuration     :
+       Sampler class          : 
org.apache.accumulo.core.client.sample.RowSampler
+       Sampler options        : {hasher=murmur3_32, modulus=2}
+
+    Sample Locality group    : <DEFAULT>
+       Start block            : 0
+       Num   blocks           : 1
+       Index level 0          : 36 bytes  1 blocks
+       First key              : 2317 doc:content [] 1437672014986 false
+       Last key               : 9255 doc:url [] 1437672014875 false
+       Num entries            : 6
+       Column families        : [doc]
+    
+    Meta block     : BCFile.index
+          Raw size             : 4 bytes
+          Compressed size      : 12 bytes
+          Compression type     : gz
+
+    Meta block     : RFile.index
+          Raw size             : 309 bytes
+          Compressed size      : 176 bytes
+          Compression type     : gz
+
+
+Shard Sampling Example
+----------------------
+
+The [shard example][shard] shows how to index and search files using Accumulo. 
 That
+example indexes documents into a table named `shard`.  The indexing scheme used
+in that example places the document name in the column qualifier.  A useful
+sample of this indexing scheme should contain all data for any document in the
+sample.   To accomplish this, the following commands build a sample for the
+shard table based on the column qualifier.
+
+    root@instance shard> config -t shard -s table.sampler.opt.hasher=murmur3_32
+    root@instance shard> config -t shard -s table.sampler.opt.modulus=101
+    root@instance shard> config -t shard -s table.sampler.opt.qualifier=true
+    root@instance shard> config -t shard -s 
table.sampler=org.apache.accumulo.core.client.sample.RowColumnSampler
+    root@instance shard> compact -t shard --sf-no-sample -w
+    2015-07-23 15:00:09,280 [shell.Shell] INFO : Compacting table ...
+    2015-07-23 15:00:10,134 [shell.Shell] INFO : Compaction of table shard 
completed for given range
+
+After enabling sampling, the command below counts the number of documents in
+the sample containing the words `import` and `int`.     
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query --sample 
-i instance16 -z localhost -t shard -u root -p secret import int | fgrep 
'.java' | wc
+         11      11    1246
+
+The command below counts the total number of documents containing the words
+`import` and `int`.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query -i 
instance16 -z localhost -t shard -u root -p secret import int | fgrep '.java' | 
wc
+       1085    1085  118175
+
+The counts 11 out of 1085 total are around what would be expected for a modulus
+of 101.  Querying the sample first provides a quick way to estimate how much 
data
+the real query will bring back. 
+
+Another way sample data could be used with the shard example is with a
+specialized iterator.  In the examples source code there is an iterator named
+CutoffIntersectingIterator.  This iterator first checks how many documents are
+found in the sample data.  If too many documents are found in the sample data,
+then it returns nothing.   Otherwise it proceeds to query the full data set.
+To experiment with this iterator, use the following command.  The
+`--sampleCutoff` option below will cause the query to return nothing if based
+on the sample it appears a query would return more than 1000 documents.
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query 
--sampleCutoff 1000 -i instance16 -z localhost -t shard -u root -p secret 
import int | fgrep '.java' | wc

[03/11] accumulo git commit: ACCUMULO-4532 Improve documentation of examples

Reply via email to