This is an automated email from the ASF dual-hosted git repository. kturner pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/accumulo-website.git
The following commit(s) were added to refs/heads/master by this push: new 61643d7 Document Scan Executors (#92) 61643d7 is described below commit 61643d764f85dd40018f25c1d9bb49a1d09db1a1 Author: Keith Turner <ke...@deenlo.com> AuthorDate: Wed Jul 11 10:13:59 2018 -0400 Document Scan Executors (#92) --- README.md | 40 ++++++---- _docs-2-0/administration/properties.md | 71 +++++++++--------- _docs-2-0/administration/scan-executors.md | 114 +++++++++++++++++++++++++++++ _plugins/links.rb | 1 - 4 files changed, 174 insertions(+), 52 deletions(-) diff --git a/README.md b/README.md index c916513..5d6d528 100644 --- a/README.md +++ b/README.md @@ -35,6 +35,32 @@ You can just build static HTML files which are viewable in `_config.yml`: cd accumulo-website bundle exec jekyll build +## Custom liquid tags + +Custom liquid tags are used to make linking to javadocs, properties, and documents easier. +The source for these tags is at [_plugins/links.rb](_plugins/links.rb). + +| Tag | Description | Options | Examples | +| ----- | ---------------------- | ------------------------------------------------------------------------------- | ---------------------------------------------------- | +| jlink | Creates Javadoc link | Link text will be class name by default. Use `-f` for full package + class name | `{% jlink -f org.apache.accumulo.core.client.Connector %}` | +| jurl | Creates Javadoc URL | None | `{% jurl org.apache.accumulo.core.client.Connector %}` | +| plink | Creates Property link | Assumes server property by default. Use `-c` to link to client properties | `{% plink -c instance.name %}` | +| purl | Creates Property URL | Default is servery property. Use `-c` to link to client properties | `{% purl instance.volumes %}` | +| dlink | Creates Documentation link | None | `{% dlink getting-stared/clients %}` | +| durl | Creates Documentation URL | None | `{% durl troubleshooting/performance %}` | + +## Updating property documentation + +Building Accumulo generates `properties.md` and `client-properties.md`. To +regenertate these, do the following. + +``` +cd <accumulo source dir> +mvn package -DskipTests +cp ./core/target/generated-docs/properties.md <accumulo website source>/_docs-2-0/administration +cp ./core/target/generated-docs/client-properties.md <accumulo website source>/_docs-2-0/administration +``` + ## Update the production website For Apache Accumulo committers, the `asf-site` branch needs to be updated with the generated @@ -72,19 +98,5 @@ the given file into your `.git/hook` directory: cp ./_devtools/git-hooks/post-commit .git/hooks/ -## Custom liquid tags - -Custom liquid tags are used to make linking to javadocs, properties, and documents easier. -The source for these tags is at [_plugins/links.rb](_plugins/links.rb). - -| Tag | Description | Options | Examples | -| ----- | ---------------------- | ------------------------------------------------------------------------------- | ---------------------------------------------------- | -| jlink | Creates Javadoc link | Link text will be class name by default. Use `-f` for full package + class name | `{% jlink -f org.apache.accumulo.core.client.Connector %}` | -| jurl | Creates Javadoc URL | None | `{% jurl org.apache.accumulo.core.client.Connector %}` | -| plink | Creates Property link | Assumes server property by default. Use `-c` to link to client properties | `{% plink -c instance.name %}` | -| purl | Creates Property URL | Default is servery property. Use `-c` to link to client properties | `{% purl instance.volumes %}` | -| dlink | Creates Documentation link | None | `{% dlink getting-stared/clients %}` | -| durl | Creates Documentation URL | None | `{% durl troubleshooting/performance %}` | - [Jekyll]: https://jekyllrb.com/ [Bundler]: https://bundler.io/ diff --git a/_docs-2-0/administration/properties.md b/_docs-2-0/administration/properties.md index c0d567b..c1c6d32 100644 --- a/_docs-2-0/administration/properties.md +++ b/_docs-2-0/administration/properties.md @@ -35,17 +35,16 @@ Below are properties set in `accumulo-site.xml` or the Accumulo shell that confi | <a name="general_server_simpletimer_threadpool_size" class="prop"></a> general.server.simpletimer.threadpool.size | The number of threads to use for server-internal scheduled tasks<br>**type:** COUNT, **zk mutable:** no, **default value:** `1` | | <a name="general_vfs_cache_dir" class="prop"></a> general.vfs.cache.dir | Directory to use for the vfs cache. The cache will keep a soft reference to all of the classes loaded in the VM. This should be on local disk on each node with sufficient space. It defaults to ${java.io.tmpdir}/accumulo-vfs-cache-${user.name}<br>**type:** ABSOLUTEPATH, **zk mutable:** no, **default value:** `${java.io.tmpdir}/accumulo-vfs-cache-${user.name}` | | <a name="general_vfs_classpaths" class="prop"></a> general.vfs.classpaths | Configuration for a system level vfs classloader. Accumulo jar can be configured here and loaded out of HDFS.<br>**type:** STRING, **zk mutable:** no, **default value:** empty | -| <a name="general_vfs_context_classpath_prefix" class="prop"></a> **general.vfs.context.classpath.*** | Properties in this category are define a classpath. These properties start with the category prefix, followed by a context name. The value is a comma seperated list of URIs. Supports full regex on filename alone. For example, general.vfs.context.classpath.cx1=hdfs://nn1:9902/mylibdir/*.jar. You can enable post delegation for a context, which will load classes from the context first i [...] +| <a name="general_vfs_context_classpath_prefix" class="prop"></a> **general.vfs.context.classpath.*** | Properties in this category are define a classpath. These properties start with the category prefix, followed by a context name. The value is a comma seperated list of URIs. Supports full regex on filename alone. For example, general.vfs.context.classpath.cx1=hdfs://nn1:9902/mylibdir/*.jar. You can enable post delegation for a context, which will load classes from the context first i [...] | <a name="instance_prefix" class="prop"></a> **instance.*** | Properties in this category must be consistent throughout a cloud. This is enforced and servers won't be able to communicate if these differ. | | <a name="instance_dfs_dir" class="prop"></a> instance.dfs.dir | **Deprecated.** ~~HDFS directory in which accumulo instance will run. Do not change after accumulo is initialized.~~<br>~~**type:** ABSOLUTEPATH~~, ~~**zk mutable:** no~~, ~~**default value:** `/accumulo`~~ | | <a name="instance_dfs_uri" class="prop"></a> instance.dfs.uri | **Deprecated.** ~~A url accumulo should use to connect to DFS. If this is empty, accumulo will obtain this information from the hadoop configuration. This property will only be used when creating new files if instance.volumes is empty. After an upgrade to 1.6.0 Accumulo will start using absolute paths to reference files. Files created before a 1.6.0 upgrade are referenced via relative paths. Relative paths will always be r [...] | <a name="instance_rpc_sasl_allowed_host_impersonation" class="prop"></a> instance.rpc.sasl.allowed.host.impersonation | One-line configuration property controlling the network locations (hostnames) that are allowed to impersonate other users<br>**type:** STRING, **zk mutable:** no, **default value:** empty | | <a name="instance_rpc_sasl_allowed_user_impersonation" class="prop"></a> instance.rpc.sasl.allowed.user.impersonation | One-line configuration property controlling what users are allowed to impersonate other users<br>**type:** STRING, **zk mutable:** no, **default value:** empty | | <a name="instance_rpc_sasl_enabled" class="prop"></a> instance.rpc.sasl.enabled | Configures Thrift RPCs to require SASL with GSSAPI which supports Kerberos authentication. Mutually exclusive with SSL RPC configuration.<br>**type:** BOOLEAN, **zk mutable:** no, **default value:** `false` | -| <a name="instance_rpc_sasl_impersonation_prefix" class="prop"></a> **instance.rpc.sasl.impersonation.*** | **Deprecated.** ~~Prefix that allows configuration of users that are allowed to impersonate other users~~ | | <a name="instance_rpc_ssl_clientAuth" class="prop"></a> instance.rpc.ssl.clientAuth | Require clients to present certs signed by a trusted root<br>**type:** BOOLEAN, **zk mutable:** no, **default value:** `false` | | <a name="instance_rpc_ssl_enabled" class="prop"></a> instance.rpc.ssl.enabled | Use SSL for socket connections from clients and among accumulo services. Mutually exclusive with SASL RPC configuration.<br>**type:** BOOLEAN, **zk mutable:** no, **default value:** `false` | -| <a name="instance_secret" class="prop"></a> instance.secret | A secret unique to a given instance that all servers must know in order to communicate with one another.It should be changed prior to the initialization of Accumulo. To change it after Accumulo has been initialized, use the ChangeSecret tool and then update accumulo-site.xml everywhere. Before using the ChangeSecret tool, make sure Accumulo is not running and you are logged in as the user that controls Accumulo files in HDFS [...] +| <a name="instance_secret" class="prop"></a> instance.secret | A secret unique to a given instance that all servers must know in order to communicate with one another. It should be changed prior to the initialization of Accumulo. To change it after Accumulo has been initialized, use the ChangeSecret tool and then update accumulo-site.xml everywhere. Before using the ChangeSecret tool, make sure Accumulo is not running and you are logged in as the user that controls Accumulo files in HDF [...] | <a name="instance_security_authenticator" class="prop"></a> instance.security.authenticator | The authenticator class that accumulo will use to determine if a user has privilege to perform an action<br>**type:** CLASSNAME, **zk mutable:** no, **default value:** `org.apache.accumulo.server.security.handler.ZKAuthenticator` | | <a name="instance_security_authorizor" class="prop"></a> instance.security.authorizor | The authorizor class that accumulo will use to determine what labels a user has privilege to see<br>**type:** CLASSNAME, **zk mutable:** no, **default value:** `org.apache.accumulo.server.security.handler.ZKAuthorizor` | | <a name="instance_security_permissionHandler" class="prop"></a> instance.security.permissionHandler | The permission handler class that accumulo will use to determine if a user has privilege to perform an action<br>**type:** CLASSNAME, **zk mutable:** no, **default value:** `org.apache.accumulo.server.security.handler.ZKPermHandler` | @@ -64,29 +63,23 @@ Below are properties set in `accumulo-site.xml` or the Accumulo shell that confi | <a name="master_metadata_suspendable" class="prop"></a> master.metadata.suspendable | Allow tablets for the accumulo.metadata table to be suspended via table.suspend.duration.<br>**type:** BOOLEAN, **zk mutable:** yes, **default value:** `false` | | <a name="master_port_client" class="prop"></a> master.port.client | The port used for handling client connections on the master<br>**type:** PORT, **zk mutable:** yes but requires restart of the master, **default value:** `9999` | | <a name="master_recovery_delay" class="prop"></a> master.recovery.delay | When a tablet server's lock is deleted, it takes time for it to completely quit. This delay gives it time before log recoveries begin.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `10s` | -| <a name="master_recovery_max_age" class="prop"></a> master.recovery.max.age | Recovery files older than this age will be removed.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `60m` | -| <a name="master_recovery_time_max" class="prop"></a> master.recovery.time.max | The maximum time to attempt recovery before giving up<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `30m` | | <a name="master_replication_coordinator_minthreads" class="prop"></a> master.replication.coordinator.minthreads | Minimum number of threads dedicated to answering coordinator requests<br>**type:** COUNT, **zk mutable:** yes, **default value:** `4` | | <a name="master_replication_coordinator_port" class="prop"></a> master.replication.coordinator.port | Port for the replication coordinator service<br>**type:** PORT, **zk mutable:** yes, **default value:** `10001` | | <a name="master_replication_coordinator_threadcheck_time" class="prop"></a> master.replication.coordinator.threadcheck.time | The time between adjustments of the coordinator thread pool<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `5s` | | <a name="master_replication_status_scan_interval" class="prop"></a> master.replication.status.scan.interval | Amount of time to sleep before scanning the status section of the replication table for new data<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `30s` | | <a name="master_server_threadcheck_time" class="prop"></a> master.server.threadcheck.time | The time between adjustments of the server thread pool.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `1s` | | <a name="master_server_threads_minimum" class="prop"></a> master.server.threads.minimum | The minimum number of threads to use to handle incoming requests.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `20` | -| <a name="master_status_threadpool_size" class="prop"></a> master.status.threadpool.size | The number of threads to use when fetching the tablet server status for balancing.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `1` | +| <a name="master_status_threadpool_size" class="prop"></a> master.status.threadpool.size | The number of threads to use when fetching the tablet server status for balancing. Zero indicates an unlimited number of threads will be used.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `0` | | <a name="master_tablet_balancer" class="prop"></a> master.tablet.balancer | The balancer class that accumulo will use to make tablet assignment and migration decisions.<br>**type:** CLASSNAME, **zk mutable:** yes, **default value:** `org.apache.accumulo.server.master.balancer.TableLoadBalancer` | | <a name="master_walog_closer_implementation" class="prop"></a> master.walog.closer.implementation | A class that implements a mechanism to steal write access to a write-ahead log<br>**type:** CLASSNAME, **zk mutable:** yes, **default value:** `org.apache.accumulo.server.master.recovery.HadoopLogCloser` | | <a name="monitor_prefix" class="prop"></a> **monitor.*** | Properties in this category affect the behavior of the monitor web server. | -| <a name="monitor_banner_background" class="prop"></a> monitor.banner.background | **Deprecated.** ~~The background color of the banner text displayed on the monitor page.~~<br>~~**type:** STRING~~, ~~**zk mutable:** yes~~, ~~**default value:** `#304065`~~ | -| <a name="monitor_banner_color" class="prop"></a> monitor.banner.color | **Deprecated.** ~~The color of the banner text displayed on the monitor page.~~<br>~~**type:** STRING~~, ~~**zk mutable:** yes~~, ~~**default value:** `#c4c4c4`~~ | -| <a name="monitor_banner_text" class="prop"></a> monitor.banner.text | **Deprecated.** ~~The banner text displayed on the monitor page.~~<br>~~**type:** STRING~~, ~~**zk mutable:** yes~~, ~~**default value:** empty~~ | | <a name="monitor_lock_check_interval" class="prop"></a> monitor.lock.check.interval | The amount of time to sleep between checking for the Montior ZooKeeper lock<br>**type:** TIMEDURATION, **zk mutable:** no, **default value:** `5s` | -| <a name="monitor_log_date_format" class="prop"></a> monitor.log.date.format | The SimpleDateFormat string used to configure the date shown on the 'Recent Logs' monitor page<br>**type:** STRING, **zk mutable:** no, **default value:** `yyyy/MM/dd HH:mm:ss,SSS` | | <a name="monitor_port_client" class="prop"></a> monitor.port.client | The listening port for the monitor's http service<br>**type:** PORT, **zk mutable:** no, **default value:** `9995` | | <a name="monitor_port_log4j" class="prop"></a> monitor.port.log4j | The listening port for the monitor's log4j logging collection.<br>**type:** PORT, **zk mutable:** no, **default value:** `4560` | -| <a name="monitor_resources_external" class="prop"></a> monitor.resources.external | A JSON Map of Strings. Each String should be an HTML tag of an external resource (JS or CSS) to be imported by the Monitor. <br>Be sure to wrap with CDATA tags. If this value is set, all of the external resources in the <head> tag of the Monitor will be replaced with <br>the tags set here. Be sure the jquery tag is first since other scripts will depend on it. The resources that are used by default can b [...] -| <a name="monitor_ssl_exclude_ciphers" class="prop"></a> monitor.ssl.exclude.ciphers | A comma-separated list of disallowed SSL Ciphers, see mmonitor.ssl.include.ciphers to allow ciphers<br>**type:** STRING, **zk mutable:** no, **default value:** empty | +| <a name="monitor_resources_external" class="prop"></a> monitor.resources.external | A JSON Map of Strings. Each String should be an HTML tag of an external resource (JS or CSS) to be imported by the Monitor. Be sure to wrap with CDATA tags. If this value is set, all of the external resources in the `<head>` tag of the Monitor will be replaced with the tags set here. Be sure the jquery tag is first since other scripts will depend on it. The resources that are used by default can be seen [...] +| <a name="monitor_ssl_exclude_ciphers" class="prop"></a> monitor.ssl.exclude.ciphers | A comma-separated list of disallowed SSL Ciphers, see monitor.ssl.include.ciphers to allow ciphers<br>**type:** STRING, **zk mutable:** no, **default value:** empty | | <a name="monitor_ssl_include_ciphers" class="prop"></a> monitor.ssl.include.ciphers | A comma-separated list of allows SSL Ciphers, see monitor.ssl.exclude.ciphers to disallow ciphers<br>**type:** STRING, **zk mutable:** no, **default value:** empty | -| <a name="monitor_ssl_include_protocols" class="prop"></a> monitor.ssl.include.protocols | A comma-separate list of allowed SSL protocols<br>**type:** STRING, **zk mutable:** no, **default value:** `TLSv1,TLSv1.1,TLSv1.2` | +| <a name="monitor_ssl_include_protocols" class="prop"></a> monitor.ssl.include.protocols | A comma-separate list of allowed SSL protocols<br>**type:** STRING, **zk mutable:** no, **default value:** `TLSv1.2` | | <a name="monitor_ssl_keyStore" class="prop"></a> monitor.ssl.keyStore | The keystore for enabling monitor SSL.<br>**type:** PATH, **zk mutable:** no, **default value:** empty | | <a name="monitor_ssl_keyStorePassword" class="prop"></a> monitor.ssl.keyStorePassword | The keystore password for enabling monitor SSL.<br>**type:** STRING, **zk mutable:** no, **default value:** empty | | <a name="monitor_ssl_keyStoreType" class="prop"></a> monitor.ssl.keyStoreType | Type of SSL keystore<br>**type:** STRING, **zk mutable:** no, **default value:** `jks` | @@ -122,15 +115,15 @@ Below are properties set in `accumulo-site.xml` or the Accumulo shell that confi | <a name="rpc_javax_net_ssl_trustStoreType" class="prop"></a> rpc.javax.net.ssl.trustStoreType | Type of SSL truststore<br>**type:** STRING, **zk mutable:** no, **default value:** `jks` | | <a name="rpc_sasl_qop" class="prop"></a> rpc.sasl.qop | The quality of protection to be used with SASL. Valid values are 'auth', 'auth-int', and 'auth-conf'<br>**type:** STRING, **zk mutable:** no, **default value:** `auth` | | <a name="rpc_ssl_cipher_suites" class="prop"></a> rpc.ssl.cipher.suites | Comma separated list of cipher suites that can be used by accepted connections<br>**type:** STRING, **zk mutable:** no, **default value:** empty | -| <a name="rpc_ssl_client_protocol" class="prop"></a> rpc.ssl.client.protocol | The protocol used to connect to a secure server, must be in the list of enabled protocols on the server side (rpc.ssl.server.enabled.protocols)<br>**type:** STRING, **zk mutable:** no, **default value:** `TLSv1` | -| <a name="rpc_ssl_server_enabled_protocols" class="prop"></a> rpc.ssl.server.enabled.protocols | Comma separated list of protocols that can be used to accept connections<br>**type:** STRING, **zk mutable:** no, **default value:** `TLSv1,TLSv1.1,TLSv1.2` | +| <a name="rpc_ssl_client_protocol" class="prop"></a> rpc.ssl.client.protocol | The protocol used to connect to a secure server, must be in the list of enabled protocols on the server side (rpc.ssl.server.enabled.protocols)<br>**type:** STRING, **zk mutable:** no, **default value:** `TLSv1.2` | +| <a name="rpc_ssl_server_enabled_protocols" class="prop"></a> rpc.ssl.server.enabled.protocols | Comma separated list of protocols that can be used to accept connections<br>**type:** STRING, **zk mutable:** no, **default value:** `TLSv1.2` | | <a name="rpc_useJsse" class="prop"></a> rpc.useJsse | Use JSSE system properties to configure SSL rather than the rpc.javax.net.ssl.* Accumulo properties<br>**type:** BOOLEAN, **zk mutable:** no, **default value:** `false` | | <a name="table_prefix" class="prop"></a> **table.*** | Properties in this category affect tablet server treatment of tablets, but can be configured on a per-table basis. Setting these properties in the site file will override the default globally for all tables and not any specific table. However, both the default and the global setting can be overridden per table using the table operations API or in the shell, which sets the overridden value in zookeeper. Restarting accumulo tablet se [...] | <a name="table_balancer" class="prop"></a> table.balancer | This property can be set to allow the LoadBalanceByTable load balancer to change the called Load Balancer for this table<br>**type:** STRING, **zk mutable:** yes, **default value:** `org.apache.accumulo.server.master.balancer.DefaultLoadBalancer` | | <a name="table_bloom_enabled" class="prop"></a> table.bloom.enabled | Use bloom filters on this table.<br>**type:** BOOLEAN, **zk mutable:** yes, **default value:** `false` | | <a name="table_bloom_error_rate" class="prop"></a> table.bloom.error.rate | Bloom filter error rate.<br>**type:** FRACTION, **zk mutable:** yes, **default value:** `0.5%` | | <a name="table_bloom_hash_type" class="prop"></a> table.bloom.hash.type | The bloom filter hash type<br>**type:** STRING, **zk mutable:** yes, **default value:** `murmur` | -| <a name="table_bloom_key_functor" class="prop"></a> table.bloom.key.functor | A function that can transform the key prior to insertion and check of bloom filter. org.apache.accumulo.core.file.keyfunctor.RowFunctor,,org.apache.accumulo.core.file.keyfunctor.ColumnFamilyFunctor, and org.apache.accumulo.core.file.keyfunctor.ColumnQualifierFunctor are allowable values. One can extend any of the above mentioned classes to perform specialized parsing of the key. <br>**type:** CLASSNAME, **zk [...] +| <a name="table_bloom_key_functor" class="prop"></a> table.bloom.key.functor | A function that can transform the key prior to insertion and check of bloom filter. org.apache.accumulo.core.file.keyfunctor.RowFunctor, org.apache.accumulo.core.file.keyfunctor.ColumnFamilyFunctor, and org.apache.accumulo.core.file.keyfunctor.ColumnQualifierFunctor are allowable values. One can extend any of the above mentioned classes to perform specialized parsing of the key. <br>**type:** CLASSNAME, **zk [...] | <a name="table_bloom_load_threshold" class="prop"></a> table.bloom.load.threshold | This number of seeks that would actually use a bloom filter must occur before a RFile's bloom filter is loaded. Set this to zero to initiate loading of bloom filters when a RFile is opened.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `1` | | <a name="table_bloom_size" class="prop"></a> table.bloom.size | Bloom filter size, as number of keys.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `1048576` | | <a name="table_cache_block_enable" class="prop"></a> table.cache.block.enable | Determines whether data block cache is enabled for a table.<br>**type:** BOOLEAN, **zk mutable:** yes, **default value:** `false` | @@ -141,23 +134,23 @@ Below are properties set in `accumulo-site.xml` or the Accumulo shell that confi | <a name="table_compaction_minor_idle" class="prop"></a> table.compaction.minor.idle | After a tablet has been idle (no mutations) for this time period it may have its in-memory map flushed to disk in a minor compaction. There is no guarantee an idle tablet will be compacted.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `5m` | | <a name="table_compaction_minor_logs_threshold" class="prop"></a> table.compaction.minor.logs.threshold | When there are more than this many write-ahead logs against a tablet, it will be minor compacted. See comment for property tserver.memory.maps.max<br>**type:** COUNT, **zk mutable:** yes, **default value:** `3` | | <a name="table_compaction_minor_merge_file_size_max" class="prop"></a> table.compaction.minor.merge.file.size.max | The max RFile size used for a merging minor compaction. The default value of 0 disables a max file size.<br>**type:** BYTES, **zk mutable:** yes, **default value:** `0` | -| <a name="table_constraint_prefix" class="prop"></a> **table.constraint.*** | Properties in this category are per-table properties that add constraints to a table. These properties start with the category prefix, followed by a number, and their values correspond to a fully qualified Java class that implements the Constraint interface.<br>For example:<br>table.constraint.1 = org.apache.accumulo.core.constraints.MyCustomConstraint<br>and:<br>table.constraint.2 = my.package.constraints.MyS [...] +| <a name="table_constraint_prefix" class="prop"></a> **table.constraint.*** | Properties in this category are per-table properties that add constraints to a table. These properties start with the category prefix, followed by a number, and their values correspond to a fully qualified Java class that implements the Constraint interface.<br>For example:<br>table.constraint.1 = org.apache.accumulo.core.constraints.MyCustomConstraint<br>and:<br> table.constraint.2 = my.package.constraints.My [...] | <a name="table_custom_prefix" class="prop"></a> **table.custom.*** | Prefix to be used for user defined arbitrary properties. | | <a name="table_durability" class="prop"></a> table.durability | The durability used to write to the write-ahead log. Legal values are: none, which skips the write-ahead log; log, which sends the data to the write-ahead log, but does nothing to make it durable; flush, which pushes data to the file system; and sync, which ensures the data is written to disk.<br>**type:** DURABILITY, **zk mutable:** yes, **default value:** `sync` | | <a name="table_failures_ignore" class="prop"></a> table.failures.ignore | If you want queries for your table to hang or fail when data is missing from the system, then set this to false. When this set to true missing data will be reported but queries will still run possibly returning a subset of the data.<br>**type:** BOOLEAN, **zk mutable:** yes, **default value:** `false` | | <a name="table_file_blocksize" class="prop"></a> table.file.blocksize | The HDFS block size used when writing RFiles. When set to 0B, the value/defaults of HDFS property 'dfs.block.size' will be used.<br>**type:** BYTES, **zk mutable:** yes, **default value:** `0B` | | <a name="table_file_compress_blocksize" class="prop"></a> table.file.compress.blocksize | The maximum size of data blocks in RFiles before they are compressed and written.<br>**type:** BYTES, **zk mutable:** yes, **default value:** `100K` | | <a name="table_file_compress_blocksize_index" class="prop"></a> table.file.compress.blocksize.index | The maximum size of index blocks in RFiles before they are compressed and written.<br>**type:** BYTES, **zk mutable:** yes, **default value:** `128K` | -| <a name="table_file_compress_type" class="prop"></a> table.file.compress.type | Compression algorithm used on index and data blocks before they are written. Possible values: gz, snappy, lzo, none<br>**type:** STRING, **zk mutable:** yes, **default value:** `gz` | +| <a name="table_file_compress_type" class="prop"></a> table.file.compress.type | Compression algorithm used on index and data blocks before they are written. Possible values: zstd, gz, snappy, lzo, none<br>**type:** STRING, **zk mutable:** yes, **default value:** `gz` | | <a name="table_file_max" class="prop"></a> table.file.max | The maximum number of RFiles each tablet in a table can have. When adjusting this property you may want to consider adjusting table.compaction.major.ratio also. Setting this property to 0 will make it default to tserver.scan.files.open.max-1, this will prevent a tablet from having more RFiles than can be opened. Setting this property low may throttle ingest and increase query performance.<br>**type:** COUNT, **zk mutable:** ye [...] | <a name="table_file_replication" class="prop"></a> table.file.replication | The number of replicas for a table's RFiles in HDFS. When set to 0, HDFS defaults are used.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `0` | -| <a name="table_file_summary_maxSize" class="prop"></a> table.file.summary.maxSize | The maximum size summary that will be stored. The number of RFiles that had summary data exceeding this threshold is reported by Summary.getFileStatistics().getLarge(). When adjusting this consider the expected number RFiles with summaries on each tablet server and the summary cache size.<br>**type:** BYTES, **zk mutable:** yes, **default value:** `256K` | +| <a name="table_file_summary_maxSize" class="prop"></a> table.file.summary.maxSize | The maximum size summary that will be stored. The number of RFiles that had summary data exceeding this threshold is reported by Summary.getFileStatistics().getLarge(). When adjusting this consider the expected number RFiles with summaries on each tablet server and the summary cache size.<br>**type:** BYTES, **zk mutable:** yes, **default value:** `256K` | | <a name="table_file_type" class="prop"></a> table.file.type | Change the type of file a table writes<br>**type:** STRING, **zk mutable:** yes, **default value:** `rf` | | <a name="table_formatter" class="prop"></a> table.formatter | The Formatter class to apply on results in the shell<br>**type:** STRING, **zk mutable:** yes, **default value:** `org.apache.accumulo.core.util.format.DefaultFormatter` | -| <a name="table_group_prefix" class="prop"></a> **table.group.*** | Properties in this category are per-table properties that define locality groups in a table. These properties start with the category prefix, followed by a name, followed by a period, and followed by a property for that group.<br>For example table.group.group1=x,y,z sets the column families for a group called group1. Once configured, group1 can be enabled by adding it to the list of groups in the table.groups.enabled pr [...] +| <a name="table_group_prefix" class="prop"></a> **table.group.*** | Properties in this category are per-table properties that define locality groups in a table. These properties start with the category prefix, followed by a name, followed by a period, and followed by a property for that group.<br>For example table.group.group1=x,y,z sets the column families for a group called group1. Once configured, group1 can be enabled by adding it to the list of groups in the table.groups.enabled pr [...] | <a name="table_groups_enabled" class="prop"></a> table.groups.enabled | A comma separated list of locality group names to enable for this table.<br>**type:** STRING, **zk mutable:** yes, **default value:** empty | | <a name="table_interepreter" class="prop"></a> table.interepreter | The ScanInterpreter class to apply on scan arguments in the shell<br>**type:** STRING, **zk mutable:** yes, **default value:** `org.apache.accumulo.core.util.interpret.DefaultScanInterpreter` | -| <a name="table_iterator_prefix" class="prop"></a> **table.iterator.*** | Properties in this category specify iterators that are applied at various stages (scopes) of interaction with a table. These properties start with the category prefix, followed by a scope (minc, majc, scan, etc.), followed by a period, followed by a name, as in table.iterator.scan.vers, or table.iterator.scan.custom. The values for these properties are a number indicating the ordering in which it is applied, and a [...] +| <a name="table_iterator_prefix" class="prop"></a> **table.iterator.*** | Properties in this category specify iterators that are applied at various stages (scopes) of interaction with a table. These properties start with the category prefix, followed by a scope (minc, majc, scan, etc.), followed by a period, followed by a name, as in table.iterator.scan.vers, or table.iterator.scan.custom. The values for these properties are a number indicating the ordering in which it is applied, and a [...] | <a name="table_iterator_majc_prefix" class="prop"></a> **table.iterator.majc.*** | Convenience prefix to find options for the majc iterator scope | | <a name="table_iterator_minc_prefix" class="prop"></a> **table.iterator.minc.*** | Convenience prefix to find options for the minc iterator scope | | <a name="table_iterator_scan_prefix" class="prop"></a> **table.iterator.scan.*** | Convenience prefix to find options for the scan iterator scope | @@ -165,14 +158,16 @@ Below are properties set in `accumulo-site.xml` or the Accumulo shell that confi | <a name="table_majc_compaction_strategy_opts_prefix" class="prop"></a> **table.majc.compaction.strategy.opts.*** | Properties in this category are used to configure the compaction strategy. | | <a name="table_replication" class="prop"></a> table.replication | Is replication enabled for the given table<br>**type:** BOOLEAN, **zk mutable:** yes, **default value:** `false` | | <a name="table_replication_target_prefix" class="prop"></a> **table.replication.target.*** | Enumerate a mapping of other systems which this table should replicate their data to. The key suffix is the identifying cluster name and the value is an identifier for a location on the target system, e.g. the ID of the table on the target to replicate to | -| <a name="table_sampler" class="prop"></a> table.sampler | The name of a class that implements org.apache.accumulo.core.Sampler. Setting this option enables storing a sample of data which can be scanned. Always having a current sample can useful for query optimization and data comprehension. After enabling sampling for an existing table, a compaction is needed to compute the sample for existing data. The compact command in the shell has an option to only compact RFiles without samp [...] -| <a name="table_sampler_opt_prefix" class="prop"></a> **table.sampler.opt.*** | The property is used to set options for a sampler. If a sample had two options like hasher and modulous, then the two properties table.sampler.opt.hasher=${hash algorithm} and table.sampler.opt.modulous=${mod} would be set. | +| <a name="table_sampler" class="prop"></a> table.sampler | The name of a class that implements org.apache.accumulo.core.Sampler. Setting this option enables storing a sample of data which can be scanned. Always having a current sample can useful for query optimization and data comprehension. After enabling sampling for an existing table, a compaction is needed to compute the sample for existing data. The compact command in the shell has an option to only compact RFiles without sample da [...] +| <a name="table_sampler_opt_prefix" class="prop"></a> **table.sampler.opt.*** | The property is used to set options for a sampler. If a sample had two options like hasher and modulous, then the two properties table.sampler.opt.hasher=${hash algorithm} and table.sampler.opt.modulous=${mod} would be set. | +| <a name="table_scan_dispatcher" class="prop"></a> table.scan.dispatcher | This class is used to dynamically dispatch scans to configured scan executors. Configured classes must implement {% jlink org.apache.accumulo.core.spi.scan.ScanDispatcher %} See [scan executors]({% durl administration/scan-executors %}) for an overview of why and how to use this property. This property is ignored for the root and metadata table. The metadata table always dispatches to a scan executor named `met [...] +| <a name="table_scan_dispatcher_opts_prefix" class="prop"></a> **table.scan.dispatcher.opts.*** | Options for the table scan dispatcher | | <a name="table_scan_max_memory" class="prop"></a> table.scan.max.memory | The maximum amount of memory that will be used to cache results of a client query/scan. Once this limit is reached, the buffered data is sent to the client.<br>**type:** BYTES, **zk mutable:** yes, **default value:** `512K` | -| <a name="table_security_scan_visibility_default" class="prop"></a> table.security.scan.visibility.default | The security label that will be assumed at scan time if an entry does not have a visibility set.<br>Note: An empty security label is displayed as []. The scan results will show an empty visibility even if the visibility from this setting is applied to the entry.<br>CAUTION: If a particular key has an empty security label AND its table's default visibility is also empty, access wi [...] +| <a name="table_security_scan_visibility_default" class="prop"></a> table.security.scan.visibility.default | The security label that will be assumed at scan time if an entry does not have a visibility expression.<br>Note: An empty security label is displayed as []. The scan results will show an empty visibility even if the visibility from this setting is applied to the entry.<br>CAUTION: If a particular key has an empty security label AND its table's default visibility is also empty, ac [...] | <a name="table_split_endrow_size_max" class="prop"></a> table.split.endrow.size.max | Maximum size of end row<br>**type:** BYTES, **zk mutable:** yes, **default value:** `10K` | | <a name="table_split_threshold" class="prop"></a> table.split.threshold | A tablet is split when the combined size of RFiles exceeds this amount.<br>**type:** BYTES, **zk mutable:** yes, **default value:** `1G` | -| <a name="table_summarizer_prefix" class="prop"></a> **table.summarizer.*** | Prefix for configuring summarizers for a table. Using this prefix multiple summarizers can be configured with options for each one. Each summarizer configured should have a unique id, this id can be anything. To add a summarizer set table.summarizer.<unique id>=<summarizer class name>. If the summarizer has options, then for each option set table.summarizer.<unique id>.opt.<key>=<value>. | -| <a name="table_suspend_duration" class="prop"></a> table.suspend.duration | For tablets belonging to this table: When a tablet server dies, allow the tablet server this duration to revive before reassigning its tabletsto other tablet servers.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `0s` | +| <a name="table_summarizer_prefix" class="prop"></a> **table.summarizer.*** | Prefix for configuring summarizers for a table. Using this prefix multiple summarizers can be configured with options for each one. Each summarizer configured should have a unique id, this id can be anything. To add a summarizer set `table.summarizer.<unique id>=<summarizer class name>.` If the summarizer has options, then for each option set `table.summarizer.<unique id>.opt.<key>=<value>`. | +| <a name="table_suspend_duration" class="prop"></a> table.suspend.duration | For tablets belonging to this table: When a tablet server dies, allow the tablet server this duration to revive before reassigning its tablets to other tablet servers.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `0s` | | <a name="table_walog_enabled" class="prop"></a> table.walog.enabled | **Deprecated.** ~~This setting is deprecated. Use table.durability=none instead.~~<br>~~**type:** BOOLEAN~~, ~~**zk mutable:** yes~~, ~~**default value:** `true`~~ | | <a name="trace_prefix" class="prop"></a> **trace.*** | Properties in this category affect the behavior of distributed tracing. | | <a name="trace_password" class="prop"></a> trace.password | The password for the user used to store distributed traces<br>**type:** STRING, **zk mutable:** no, **default value:** `secret` | @@ -181,13 +176,12 @@ Below are properties set in `accumulo-site.xml` or the Accumulo shell that confi | <a name="trace_span_receivers" class="prop"></a> trace.span.receivers | A list of span receiver classes to send trace spans<br>**type:** CLASSNAMELIST, **zk mutable:** no, **default value:** `org.apache.accumulo.tracer.ZooTraceClient` | | <a name="trace_table" class="prop"></a> trace.table | The name of the table to store distributed traces<br>**type:** STRING, **zk mutable:** no, **default value:** `trace` | | <a name="trace_token_property_prefix" class="prop"></a> **trace.token.property.*** | The prefix used to create a token for storing distributed traces. For each property required by trace.token.type, place this prefix in front of it. | -| <a name="trace_token_type" class="prop"></a> trace.token.type | An AuthenticationToken type supported by the authorizer<br>**type:** CLASSNAME, **zk mutable:** no, **default value:** `org.apache.accumulo.core.client.security.tokens.PasswordToken` | +| <a name="trace_token_type" class="prop"></a> trace.token.type | An AuthenticationToken type supported by the authorizer<br>**type:** CLASSNAME, **zk mutable:** no, **default value:** {% jlink -f org.apache.accumulo.core.client.security.tokens.PasswordToken %} | | <a name="trace_user" class="prop"></a> trace.user | The name of the user to store distributed traces<br>**type:** STRING, **zk mutable:** no, **default value:** `root` | | <a name="trace_zookeeper_path" class="prop"></a> trace.zookeeper.path | The zookeeper node where tracers are registered<br>**type:** STRING, **zk mutable:** no, **default value:** `/tracers` | | <a name="tserver_prefix" class="prop"></a> **tserver.*** | Properties in this category affect the behavior of the tablet servers | -| <a name="tserver_archive_walogs" class="prop"></a> tserver.archive.walogs | Keep copies of the WALOGs for debugging purposes<br>**type:** BOOLEAN, **zk mutable:** yes, **default value:** `false` | | <a name="tserver_assignment_concurrent_max" class="prop"></a> tserver.assignment.concurrent.max | The number of threads available to load tablets. Recoveries are still performed serially.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `2` | -| <a name="tserver_assignment_duration_warning" class="prop"></a> tserver.assignment.duration.warning | The amount of time an assignment can run before the server will print a warning along with the current stack trace. Meant to help debug stuck assignments<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `10m` | +| <a name="tserver_assignment_duration_warning" class="prop"></a> tserver.assignment.duration.warning | The amount of time an assignment can run before the server will print a warning along with the current stack trace. Meant to help debug stuck assignments<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `10m` | | <a name="tserver_bloom_load_concurrent_max" class="prop"></a> tserver.bloom.load.concurrent.max | The number of concurrent threads that will load bloom filters in the background. Setting this to zero will make bloom filters load in the foreground.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `4` | | <a name="tserver_bulk_assign_threads" class="prop"></a> tserver.bulk.assign.threads | The master delegates bulk import RFile processing and assignment to tablet servers. After file has been processed, the tablet server will assign the file to the appropriate tablets on all servers. This property controls the number of threads used to communicate to the other servers.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `1` | | <a name="tserver_bulk_process_threads" class="prop"></a> tserver.bulk.process.threads | The master will task a tablet server with pre-processing a bulk import RFile prior to assigning it to the appropriate tablet servers. This configuration value controls the number of threads used to process the files.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `1` | @@ -213,17 +207,20 @@ Below are properties set in `accumulo-site.xml` or the Accumulo shell that confi | <a name="tserver_memory_manager" class="prop"></a> tserver.memory.manager | An implementation of MemoryManger that accumulo will use.<br>**type:** CLASSNAME, **zk mutable:** yes, **default value:** `org.apache.accumulo.server.tabletserver.LargestFirstMemoryManager` | | <a name="tserver_memory_maps_max" class="prop"></a> tserver.memory.maps.max | Maximum amount of memory that can be used to buffer data written to a tablet server. There are two other properties that can effectively limit memory usage table.compaction.minor.logs.threshold and tserver.walog.max.size. Ensure that table.compaction.minor.logs.threshold * tserver.walog.max.size >= this property.<br>**type:** MEMORY, **zk mutable:** yes, **default value:** `33%` | | <a name="tserver_memory_maps_native_enabled" class="prop"></a> tserver.memory.maps.native.enabled | An in-memory data store for accumulo implemented in c++ that increases the amount of data accumulo can hold in memory and avoids Java GC pauses.<br>**type:** BOOLEAN, **zk mutable:** yes but requires restart of the tserver, **default value:** `true` | -| <a name="tserver_metadata_readahead_concurrent_max" class="prop"></a> tserver.metadata.readahead.concurrent.max | The maximum number of concurrent metadata read ahead that will execute.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `8` | +| <a name="tserver_metadata_readahead_concurrent_max" class="prop"></a> tserver.metadata.readahead.concurrent.max | **Deprecated.** ~~This property is deprecated since 2.0.0, use tserver.scan.executors.meta.threads instead. The maximum number of concurrent metadata read ahead that will execute.~~<br>~~**type:** COUNT~~, ~~**zk mutable:** yes~~, ~~**default value:** `8`~~ | | <a name="tserver_migrations_concurrent_max" class="prop"></a> tserver.migrations.concurrent.max | The maximum number of concurrent tablet migrations for a tablet server<br>**type:** COUNT, **zk mutable:** yes, **default value:** `1` | | <a name="tserver_monitor_fs" class="prop"></a> tserver.monitor.fs | When enabled the tserver will monitor file systems and kill itself when one switches from rw to ro. This is usually and indication that Linux has detected a bad disk.<br>**type:** BOOLEAN, **zk mutable:** yes, **default value:** `true` | -| <a name="tserver_mutation_queue_max" class="prop"></a> tserver.mutation.queue.max | **Deprecated.** ~~This setting is deprecated. See tserver.total.mutation.queue.max. The amount of memory to use to store write-ahead-log mutations-per-session before flushing them. Since the buffer is per write session, consider the max number of concurrent writer when configuring. When using Hadoop 2, Accumulo will call hsync() on the WAL . For a small number of concurrent writers, increasing this buff [...] | <a name="tserver_port_client" class="prop"></a> tserver.port.client | The port used for handling client connections on the tablet servers<br>**type:** PORT, **zk mutable:** yes but requires restart of the tserver, **default value:** `9997` | | <a name="tserver_port_search" class="prop"></a> tserver.port.search | if the ports above are in use, search higher ports until one is available<br>**type:** BOOLEAN, **zk mutable:** yes, **default value:** `false` | -| <a name="tserver_readahead_concurrent_max" class="prop"></a> tserver.readahead.concurrent.max | The maximum number of concurrent read ahead that will execute. This effectively limits the number of long running scans that can run concurrently per tserver.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `16` | +| <a name="tserver_readahead_concurrent_max" class="prop"></a> tserver.readahead.concurrent.max | **Deprecated.** ~~This property is deprecated since 2.0.0, use tserver.scan.executors.default.threads instead. The maximum number of concurrent read ahead that will execute. This effectively limits the number of long running scans that can run concurrently per tserver."~~<br>~~**type:** COUNT~~, ~~**zk mutable:** yes~~, ~~**default value:** `16`~~ | | <a name="tserver_recovery_concurrent_max" class="prop"></a> tserver.recovery.concurrent.max | The maximum number of threads to use to sort logs during recovery<br>**type:** COUNT, **zk mutable:** yes, **default value:** `2` | | <a name="tserver_replication_batchwriter_replayer_memory" class="prop"></a> tserver.replication.batchwriter.replayer.memory | Memory to provide to batchwriter to replay mutations for replication<br>**type:** BYTES, **zk mutable:** yes, **default value:** `50M` | | <a name="tserver_replication_default_replayer" class="prop"></a> tserver.replication.default.replayer | Default AccumuloReplicationReplayer implementation<br>**type:** CLASSNAME, **zk mutable:** yes, **default value:** `org.apache.accumulo.tserver.replication.BatchWriterReplicationReplayer` | | <a name="tserver_replication_replayer_prefix" class="prop"></a> **tserver.replication.replayer.*** | Allows configuration of implementation used to apply replicated data | +| <a name="tserver_scan_executors_prefix" class="prop"></a> **tserver.scan.executors.*** | Prefix for defining executors to service scans. See [scan executors]({% durl administration/scan-executors %}) for an overview of why and how to use this property. For each executor the number of threads, thread priority, and an optional prioritizer can be configured. To configure a new executor, set `tserver.scan.executors.<name>.threads=<number>`. Optionally, can also set `tserver.scan.executors [...] +| <a name="tserver_scan_executors_default_prioritizer" class="prop"></a> tserver.scan.executors.default.prioritizer | Prioritizer for the default scan executor. Defaults to none which results in FIFO priority. Set to a class that implements org.apache.accumulo.core.spi.scan.ScanPrioritizer to configure one.<br>**type:** STRING, **zk mutable:** yes, **default value:** empty | +| <a name="tserver_scan_executors_default_threads" class="prop"></a> tserver.scan.executors.default.threads | The number of threads for the scan executor that tables use by default.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `16` | +| <a name="tserver_scan_executors_meta_threads" class="prop"></a> tserver.scan.executors.meta.threads | The number of threads for the metadata table scan executor.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `8` | | <a name="tserver_scan_files_open_max" class="prop"></a> tserver.scan.files.open.max | Maximum total RFiles that all tablets in a tablet server can open for scans. <br>**type:** COUNT, **zk mutable:** yes but requires restart of the tserver, **default value:** `100` | | <a name="tserver_server_message_size_max" class="prop"></a> tserver.server.message.size.max | The maximum size of a message that can be sent to a tablet server.<br>**type:** BYTES, **zk mutable:** yes, **default value:** `1G` | | <a name="tserver_server_threadcheck_time" class="prop"></a> tserver.server.threadcheck.time | The time between adjustments of the server thread pool.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `1s` | @@ -232,8 +229,8 @@ Below are properties set in `accumulo-site.xml` or the Accumulo shell that confi | <a name="tserver_session_update_idle_max" class="prop"></a> tserver.session.update.idle.max | When a tablet server's SimpleTimer thread triggers to check idle sessions, this configurable option will be used to evaluate update sessions to determine if they can be closed due to inactivity<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `1m` | | <a name="tserver_slow_flush_time" class="prop"></a> tserver.slow.flush.time | If a flush to the write-ahead log takes longer than this period of time, debugging information will written, and may result in a log rollover.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `100ms` | | <a name="tserver_sort_buffer_size" class="prop"></a> tserver.sort.buffer.size | The amount of memory to use when sorting logs during recovery.<br>**type:** MEMORY, **zk mutable:** yes, **default value:** `10%` | -| <a name="tserver_summary_partition_threads" class="prop"></a> tserver.summary.partition.threads | Summary data must be retrieved from RFiles. For a large number of RFiles, the files are broken into partitions of 100K files. This setting determines how many of these groups of 100K RFiles will be processed concurrently.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `10` | -| <a name="tserver_summary_remote_threads" class="prop"></a> tserver.summary.remote.threads | For a partitioned group of 100K RFiles, those files are grouped by tablet server. Then a remote tablet server is asked to gather summary data. This setting determines how many concurrent request are made per partition.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `128` | +| <a name="tserver_summary_partition_threads" class="prop"></a> tserver.summary.partition.threads | Summary data must be retrieved from RFiles. For a large number of RFiles, the files are broken into partitions of 100K files. This setting determines how many of these groups of 100K RFiles will be processed concurrently.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `10` | +| <a name="tserver_summary_remote_threads" class="prop"></a> tserver.summary.remote.threads | For a partitioned group of 100K RFiles, those files are grouped by tablet server. Then a remote tablet server is asked to gather summary data. This setting determines how many concurrent request are made per partition.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `128` | | <a name="tserver_summary_retrieval_threads" class="prop"></a> tserver.summary.retrieval.threads | The number of threads on each tablet server available to retrieve summary data, that is not currently in cache, from RFiles.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `10` | | <a name="tserver_tablet_split_midpoint_files_max" class="prop"></a> tserver.tablet.split.midpoint.files.max | To find a tablets split points, all RFiles are opened and their indexes are read. This setting determines how many RFiles can be opened at once. When there are more RFiles than this setting multiple passes must be made, which is slower. However opening too many RFiles at once can cause problems.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `300` | | <a name="tserver_total_mutation_queue_max" class="prop"></a> tserver.total.mutation.queue.max | The amount of memory used to store write-ahead-log mutations before flushing them.<br>**type:** MEMORY, **zk mutable:** yes, **default value:** `5%` | @@ -243,9 +240,9 @@ Below are properties set in `accumulo-site.xml` or the Accumulo shell that confi | <a name="tserver_wal_sync_method" class="prop"></a> tserver.wal.sync.method | **Deprecated.** ~~This property is deprecated. Use table.durability instead.~~<br>~~**type:** STRING~~, ~~**zk mutable:** yes~~, ~~**default value:** `hsync`~~ | | <a name="tserver_walog_max_age" class="prop"></a> tserver.walog.max.age | The maximum age for each write-ahead log.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `24h` | | <a name="tserver_walog_max_size" class="prop"></a> tserver.walog.max.size | The maximum size for each write-ahead log. See comment for property tserver.memory.maps.max<br>**type:** BYTES, **zk mutable:** yes, **default value:** `1g` | -| <a name="tserver_walog_maximum_wait_duration" class="prop"></a> tserver.walog.maximum.wait.duration | The maximum amount of time to wait after a failure to create a write-ahead log.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `5m` | -| <a name="tserver_walog_tolerated_creation_failures" class="prop"></a> tserver.walog.tolerated.creation.failures | The maximum number of failures tolerated when creating a new write-ahead log within the period specified by tserver.walog.failures.period. Exceeding this number of failures in the period causes the TabletServer to exit.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `50` | -| <a name="tserver_walog_tolerated_wait_increment" class="prop"></a> tserver.walog.tolerated.wait.increment | The amount of time to wait between failures to create a WALog.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `1000ms` | +| <a name="tserver_walog_maximum_wait_duration" class="prop"></a> tserver.walog.maximum.wait.duration | The maximum amount of time to wait after a failure to create or write a write-ahead log.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `5m` | +| <a name="tserver_walog_tolerated_creation_failures" class="prop"></a> tserver.walog.tolerated.creation.failures | The maximum number of failures tolerated when creating a new write-ahead log. Negative values will allow unlimited creation failures. Exceeding this number of failures consecutively trying to create a new write-ahead log causes the TabletServer to exit.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `50` | +| <a name="tserver_walog_tolerated_wait_increment" class="prop"></a> tserver.walog.tolerated.wait.increment | The amount of time to wait between failures to create or write a write-ahead log.<br>**type:** TIMEDURATION, **zk mutable:** yes, **default value:** `1000ms` | | <a name="tserver_workq_threads" class="prop"></a> tserver.workq.threads | The number of threads for the distributed work queue. These threads are used for copying failed bulk import RFiles.<br>**type:** COUNT, **zk mutable:** yes, **default value:** `2` | ### Property Types diff --git a/_docs-2-0/administration/scan-executors.md b/_docs-2-0/administration/scan-executors.md new file mode 100644 index 0000000..be8d4ec --- /dev/null +++ b/_docs-2-0/administration/scan-executors.md @@ -0,0 +1,114 @@ +--- +title: Scan Executors +category: administration +order: 13 +--- + +Accumulo scans operate by repeatedly fetching batches of data from a [tablet +server][tserver]. On the tablet server side, a thread pool fetches batches. +In Java threads pools are called executors. By default, a single executor per +tablet server handles all scans in FIFO order. For some workloads, the single +FIFO executor is suboptimal. For example, consider many unimportant scans +reading lots of data mixed with a few important scans reading small amounts of +data. The long scans noticeably increase the latency of the short scans. +Accumulo offers two mechanisms to help improve situations like this: multiple +scan executors and per executor prioritizers. Additional scan executors can +give tables dedicated resources. For each scan executor, an optional +prioritizer can reorder queued work. + +### Configuring and using Scan Executors + +By default, Accumulo sets `tserver.scan.executors.default.threads=16` which +creates the default scan executor. To configure additional scan executors, +chose a unique name and configure [tserver.scan.executors.*]({% purl tserver.scan.executors.prefix %}). Setting +the following causes each tablet server to create a scan executor with the +specified threads. + +``` +tserver.scan.executors.<name>.threads=<number> +``` + +Optionally, some of the following can be set. The `priority` setting +determines thread priority. The `prioritizer` settings specifies a class that +orders pending work. + +``` +tserver.scan.executors.<name>.priority=<number 1 to 10> +tserver.scan.executors.<name>.prioritizer=<class name> +tserver.scan.executors.<name>.prioritizer.opts.<key>=<value> +``` + +After creating an executor, configure {% plink table.scan.dispatcher %} to use it. A +dispatcher is Java subclass of {%jlink org.apache.accumulo.core.spi.scan.ScanDispatcher %} +that decides which scan executor should service a table. Set the following table +property to configure a dispatcher. + +``` +table.scan.dispatcher=<class name> +``` + +Scan dispatcher options can be set with properties like the following. + +``` +table.scan.dispatcher.opts.<key>=<value> +``` + +The default value for `table.scan.dispatcher` is {% jlink org.apache.accumulo.core.spi.scan.SimpleScanDispatcher %}. +SimpleScanDispatcher supports an `executor` option for choosing a scan +executor. If this option is not set, then SimpleScanDispatcher will dispatch +to the scan executor named `default`. + +To to tie everything together, consider the following use case. + + * Create tables named LOW1 and LOW2 using a scan executor with a single thread. + * Create a table named HIGH with a dedicated scan executor with 8 threads. + * Create tables named NORM1 and NORM2 using the default scan executor. + * Set the default executor to 4 threads. + +The following shell commands implement this use case. + +``` +createtable LOW1 +createtable LOW2 +createtable HIGH +createtable NORM1 +createtable NORM2 +config -s tserver.scan.executors.default.threads=4 +config -s tserver.scan.executors.low.threads=1 +config -s tserver.scan.executors.high.threads=8 +``` + +Tablet servers should be restarted after configuring scan executors, then tables can be configured. + +``` +config -t LOW1 -s table.scan.dispatcher.opts.executor=low +config -t LOW2 -s table.scan.dispatcher.opts.executor=low +config -t HIGH -s table.scan.dispatcher.opts.executor=high +``` + +While not necessary because its the default, it would be safer to also set +`table.scan.dispatcher=org.apache.accumulo.core.spi.scan.SimpleScanDispatcher` +for each table. This ensures things work as expected in the case where +`table.scan.dispatcher` was set at the system or namespace level. + +### Configuring and using Scan Prioritizers. + +When all scan executor threads are busy, incoming work is queued. By +default this queue has a FIFO order. A {% jlink org.apache.accumulo.core.spi.scan.ScanPrioritizer %} can be configured to +reorder the queue. Accumulo ships with the {% jlink org.apache.accumulo.core.spi.scan.IdleRatioScanPrioritizer %} which +orders the queue by the ratio of run time to idle time. For example, a scan +with a run time of 50ms and an idle time of 200ms would have a ratio of .25. +If .25 were the lowest ratio on the queue, then it would be the next in line. +The following configures the IdleRatioScanPrioritizer for the `default` scan +executor. + +``` +tserver.scan.executors.default.prioritizer=org.apache.accumulo.core.spi.scan.IdleRatioScanPrioritizer +``` + +Using the IdleRatioScanPrioritizer in a test with 50 long running scans and 5 +threads repeatedly doing small random lookups made a significant difference. +In this test the average lookup time for the 5 threads went from 250ms to 5 ms. + +[tserver]: {{ page.docs_baseurl }}/getting-started/design#tablet-server-1 + diff --git a/_plugins/links.rb b/_plugins/links.rb index afb2f1a..7757358 100755 --- a/_plugins/links.rb +++ b/_plugins/links.rb @@ -20,7 +20,6 @@ def render_javadoc(context, text, url_only) short = true if not url_only args = text.strip.split(' ', 2) - print args if args[0] == '-f' short = false clz = args[1]