mikewalch closed pull request #49: Improved design documentation of tablet 
server
URL: https://github.com/apache/accumulo-website/pull/49
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/_docs-2-0/administration/caching.md 
b/_docs-2-0/administration/caching.md
index 7c5ce527..bdd591a0 100644
--- a/_docs-2-0/administration/caching.md
+++ b/_docs-2-0/administration/caching.md
@@ -4,26 +4,27 @@ category: administration
 order: 11
 ---
 
-Accumulo tablet servers have a **block cache** that buffers data in memory to 
limit reads from disk.
+Accumulo [tablet servers][tserver] have **block caches** that buffer data in 
memory to limit reads from disk.
 This caching has the following benefits:
 
 * reduces latency when reading data
 * helps alleviate hotspots in tables
 
-The block cache stores index and data blocks. A typical Accumulo read 
operation will perform a binary search
-over several index blocks followed by a linear scan of one or more data 
blocks. Each tablet server
-has its own block cache that is shared by all hosted tablets. Therefore, block 
caches are only enabled
+Each tablet server has an index and data block cache that is shared by all 
hosted tablets (see the [tablet server diagram][tserver]
+to learn more). A typical Accumulo read operation will perform a binary search 
over several index blocks followed by a linear scan
+of one or more data blocks. If these blocks are not in a cache, they will need 
to be retrieved from [RFiles] in HDFS. While the index
+block cache is enabled for all tables, the data block cache has to be enabled 
for a table by the user. It is typically only enabled
 for tables where read performance is critical.
 
 ## Configuration
 
-While the block cache is enabled by default for the Accumulo metadata tables, 
it must be enabled
-for all other tables by setting the following table properties to `true`:
+The index and data block caches are configured for tables by the following 
properties:
 
-* [table.cache.block.enable] - enables data block cache on the table
-* [table.cache.index.enable] - enables index block cache on the table
+* [table.cache.block.enable] - enables data block cache on the table (default 
is `false`)
+* [table.cache.index.enable] - enables index block cache on the table (default 
is `true`)
 
-These properties can be set in the Accumulo shell using the following command:
+While the index block cache is enabled by default for all Accumulo tables, 
users must enable the data block cache by
+settting [table.cache.block.enable] to `true` in the shell:
 
     config -t mytable -s table.cache.block.enable=true
 
@@ -33,12 +34,14 @@ Or programatically using 
[TableOperations.setProperty()][tableops]:
 conn.tableOperations().setProperty("mytable", "table.cache.block.enable", 
"true");
 ```
 
-The sizes of the index and data block caches can be changed from their 
defaults by setting
-the following properties:
+The size of the index and data block caches (which are shared by all tablets 
of tablet server) can be changed from
+their defaults by setting the following properties:
 
 * [tserver.cache.data.size]
 * [tserver.cache.index.size]
 
+[tserver]: {{ page.docs_baseurl }}/getting-started/design#tablet-server-1
+[RFiles]: {{ page.docs_baseurl}}/getting-started/design#rfile
 [table.cache.block.enable]: {{ page.docs_baseurl 
}}/administration/properties#table_cache_block_enable
 [table.cache.index.enable]: {{ page.docs_baseurl 
}}/administration/properties#table_cache_index_enable
 [tserver.cache.data.size]: {{ page.docs_baseurl 
}}/administration/properties#tserver_cache_data_size
diff --git a/_docs-2-0/getting-started/design.md 
b/_docs-2-0/getting-started/design.md
index 26e90484..7f6a880f 100644
--- a/_docs-2-0/getting-started/design.md
+++ b/_docs-2-0/getting-started/design.md
@@ -36,7 +36,7 @@ one Master server and many Clients.
 The TabletServer manages some subset of all the tablets (partitions of 
tables). This includes receiving writes from clients, persisting writes to a
 write-ahead log, sorting new key-value pairs in memory, periodically
 flushing sorted key-value pairs to new files in HDFS, and responding
-to reads from clients, forming a merge-sorted view of all keys and
+to reads from clients, forming a sorted merge view of all keys and
 values from all the files it has created and the sorted in-memory
 store.
 
@@ -102,7 +102,7 @@ ingest and query load is balanced across the cluster.
 
 ![data distribution]({{ site.url }}/images/docs/data_distribution.png)
 
-## Tablet Service
+## Tablet Server
 
 When a write arrives at a TabletServer it is written to a Write-Ahead Log and
 then inserted into a sorted data structure in memory called a MemTable. When 
the
@@ -112,10 +112,14 @@ called a minor compaction. A new MemTable is then created 
and the fact of the
 compaction is recorded in the Write-Ahead Log.
 
 When a request to read data arrives at a TabletServer, the TabletServer does a
-binary search across the MemTable as well as the in-memory indexes associated
-with each RFile to find the relevant values. If clients are performing a scan,
-several key-value pairs are returned to the client in order from the MemTable
-and the set of RFiles by performing a merge-sort as they are read.
+binary search across the MemTable as well as the index blocks associated with 
each RFile
+to find the relevant values. If clients are performing a scan, several 
key-value pairs
+are returned to the client in order from the MemTable and data blocks of 
RFiles by performing
+a sorted merge as they are read. If [caching] is enabled for the table, any 
index or data
+block is stored in the block cache to speed up future scans.
+
+![tablet server diagram]({{ site.url }}/images/docs/tablet_server.png)
+<!-- Source at 
https://docs.google.com/presentation/d/1yEBNM044FxrzksVfxU35WDbxcVWUYUMy3tgRP75dzus/edit?usp=sharing
 -->
 
 ## RFile
 
@@ -178,3 +182,4 @@ TabletServer failures are noted on the Master's monitor 
page, accessible via
 [clients]: {{page.docs_baseurl}}/getting-started/clients
 [merging]: 
{{page.docs_baseurl}}/getting-started/table_configuration#merging-tablets
 [compaction]: 
{{page.docs_baseurl}}/getting-started/table_configuration#compaction
+[caching]: {{page.docs_baseurl}}/administration/caching
diff --git a/images/docs/tablet_server.png b/images/docs/tablet_server.png
new file mode 100644
index 00000000..2581dd01
Binary files /dev/null and b/images/docs/tablet_server.png differ


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to