mikewalch closed pull request #49: Improved design documentation of tablet server URL: https://github.com/apache/accumulo-website/pull/49
This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/_docs-2-0/administration/caching.md b/_docs-2-0/administration/caching.md index 7c5ce527..bdd591a0 100644 --- a/_docs-2-0/administration/caching.md +++ b/_docs-2-0/administration/caching.md @@ -4,26 +4,27 @@ category: administration order: 11 --- -Accumulo tablet servers have a **block cache** that buffers data in memory to limit reads from disk. +Accumulo [tablet servers][tserver] have **block caches** that buffer data in memory to limit reads from disk. This caching has the following benefits: * reduces latency when reading data * helps alleviate hotspots in tables -The block cache stores index and data blocks. A typical Accumulo read operation will perform a binary search -over several index blocks followed by a linear scan of one or more data blocks. Each tablet server -has its own block cache that is shared by all hosted tablets. Therefore, block caches are only enabled +Each tablet server has an index and data block cache that is shared by all hosted tablets (see the [tablet server diagram][tserver] +to learn more). A typical Accumulo read operation will perform a binary search over several index blocks followed by a linear scan +of one or more data blocks. If these blocks are not in a cache, they will need to be retrieved from [RFiles] in HDFS. While the index +block cache is enabled for all tables, the data block cache has to be enabled for a table by the user. It is typically only enabled for tables where read performance is critical. ## Configuration -While the block cache is enabled by default for the Accumulo metadata tables, it must be enabled -for all other tables by setting the following table properties to `true`: +The index and data block caches are configured for tables by the following properties: -* [table.cache.block.enable] - enables data block cache on the table -* [table.cache.index.enable] - enables index block cache on the table +* [table.cache.block.enable] - enables data block cache on the table (default is `false`) +* [table.cache.index.enable] - enables index block cache on the table (default is `true`) -These properties can be set in the Accumulo shell using the following command: +While the index block cache is enabled by default for all Accumulo tables, users must enable the data block cache by +settting [table.cache.block.enable] to `true` in the shell: config -t mytable -s table.cache.block.enable=true @@ -33,12 +34,14 @@ Or programatically using [TableOperations.setProperty()][tableops]: conn.tableOperations().setProperty("mytable", "table.cache.block.enable", "true"); ``` -The sizes of the index and data block caches can be changed from their defaults by setting -the following properties: +The size of the index and data block caches (which are shared by all tablets of tablet server) can be changed from +their defaults by setting the following properties: * [tserver.cache.data.size] * [tserver.cache.index.size] +[tserver]: {{ page.docs_baseurl }}/getting-started/design#tablet-server-1 +[RFiles]: {{ page.docs_baseurl}}/getting-started/design#rfile [table.cache.block.enable]: {{ page.docs_baseurl }}/administration/properties#table_cache_block_enable [table.cache.index.enable]: {{ page.docs_baseurl }}/administration/properties#table_cache_index_enable [tserver.cache.data.size]: {{ page.docs_baseurl }}/administration/properties#tserver_cache_data_size diff --git a/_docs-2-0/getting-started/design.md b/_docs-2-0/getting-started/design.md index 26e90484..7f6a880f 100644 --- a/_docs-2-0/getting-started/design.md +++ b/_docs-2-0/getting-started/design.md @@ -36,7 +36,7 @@ one Master server and many Clients. The TabletServer manages some subset of all the tablets (partitions of tables). This includes receiving writes from clients, persisting writes to a write-ahead log, sorting new key-value pairs in memory, periodically flushing sorted key-value pairs to new files in HDFS, and responding -to reads from clients, forming a merge-sorted view of all keys and +to reads from clients, forming a sorted merge view of all keys and values from all the files it has created and the sorted in-memory store. @@ -102,7 +102,7 @@ ingest and query load is balanced across the cluster. ![data distribution]({{ site.url }}/images/docs/data_distribution.png) -## Tablet Service +## Tablet Server When a write arrives at a TabletServer it is written to a Write-Ahead Log and then inserted into a sorted data structure in memory called a MemTable. When the @@ -112,10 +112,14 @@ called a minor compaction. A new MemTable is then created and the fact of the compaction is recorded in the Write-Ahead Log. When a request to read data arrives at a TabletServer, the TabletServer does a -binary search across the MemTable as well as the in-memory indexes associated -with each RFile to find the relevant values. If clients are performing a scan, -several key-value pairs are returned to the client in order from the MemTable -and the set of RFiles by performing a merge-sort as they are read. +binary search across the MemTable as well as the index blocks associated with each RFile +to find the relevant values. If clients are performing a scan, several key-value pairs +are returned to the client in order from the MemTable and data blocks of RFiles by performing +a sorted merge as they are read. If [caching] is enabled for the table, any index or data +block is stored in the block cache to speed up future scans. + +![tablet server diagram]({{ site.url }}/images/docs/tablet_server.png) +<!-- Source at https://docs.google.com/presentation/d/1yEBNM044FxrzksVfxU35WDbxcVWUYUMy3tgRP75dzus/edit?usp=sharing --> ## RFile @@ -178,3 +182,4 @@ TabletServer failures are noted on the Master's monitor page, accessible via [clients]: {{page.docs_baseurl}}/getting-started/clients [merging]: {{page.docs_baseurl}}/getting-started/table_configuration#merging-tablets [compaction]: {{page.docs_baseurl}}/getting-started/table_configuration#compaction +[caching]: {{page.docs_baseurl}}/administration/caching diff --git a/images/docs/tablet_server.png b/images/docs/tablet_server.png new file mode 100644 index 00000000..2581dd01 Binary files /dev/null and b/images/docs/tablet_server.png differ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services