This is an automated email from the ASF dual-hosted git repository. wusheng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/skywalking.git
The following commit(s) were added to refs/heads/master by this push: new 8df362b Adjust ElasticSearch index refresh period as INT(flushInterval * 2/3) (#7310) 8df362b is described below commit 8df362b92934e44557da35254aa04317e76ea7c9 Author: 吴晟 Wu Sheng <wu.sh...@foxmail.com> AuthorDate: Fri Jul 16 12:10:35 2021 +0800 Adjust ElasticSearch index refresh period as INT(flushInterval * 2/3) (#7310) Adjust index refresh period as INT(flushInterval * 2/3), it used to be as same as bulk flush period. At the edge case, in low traffic(traffic < bulkActions in the whole period), there is a possible case, 2 period bulks are included in one index refresh rebuild operation, which could cause version conflicts. And this case can't be fixed through core/persistentPeriod as the bulk fresh is not controlled by the persistent timer anymore. This PR should avoid the following exception in the low load case, especially when bulkActions is set larger than the number of a metric type. --- CHANGES.md | 4 ++++ docs/en/setup/backend/configuration-vocabulary.md | 4 ++-- .../server-bootstrap/src/main/resources/application.yml | 12 +++++++++--- .../elasticsearch/StorageModuleElasticsearchConfig.java | 5 ++++- .../plugin/elasticsearch/base/StorageEsInstaller.java | 14 +++++++++++++- 5 files changed, 32 insertions(+), 7 deletions(-) diff --git a/CHANGES.md b/CHANGES.md index 47a03e2..7bbde5e 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -94,6 +94,10 @@ Release Notes. default flush period of hour and day level metrics are 25s * 4. * Performance: optimize IDs read of ElasticSearch storage options(6 and 7). Use the physical index rather than template alias name. +* Adjust index refresh period as INT(flushInterval * 2/3), it used to be as same as bulk flush period. At the edge case, + in low traffic(traffic < bulkActions in the whole period), there is a possible case, 2 period bulks are included in + one index refresh rebuild operation, which could cause version conflicts. And this case can't be fixed + through `core/persistentPeriod` as the bulk fresh is not controlled by the persistent timer anymore. #### UI diff --git a/docs/en/setup/backend/configuration-vocabulary.md b/docs/en/setup/backend/configuration-vocabulary.md index fb79c15..6ddd0fa 100644 --- a/docs/en/setup/backend/configuration-vocabulary.md +++ b/docs/en/setup/backend/configuration-vocabulary.md @@ -99,7 +99,7 @@ core|default|role|Option values, `Mixed/Receiver/Aggregator`. **Receiver** mode | - | - | superDatasetIndexShardsFactor | Super data set has been defined in the codes, such as trace segments. This factor provides more shards for the super data set, shards number = indexShardsNumber * superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger traces.|SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR|5 | | - | - | superDatasetIndexReplicasNumber | Represent the replicas number in the super size dataset record index.|SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER|0 | | - | - | bulkActions| Async bulk size of the record data batch execution. | SW_STORAGE_ES_BULK_ACTIONS| 5000| -| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or not. Unit is second.| SW_STORAGE_ES_FLUSH_INTERVAL | 15| +| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or not. Unit is second. INT(flushInterval * 2/3) would be used for index refresh period.| SW_STORAGE_ES_FLUSH_INTERVAL | 15 (index refresh period = 10)| | - | - | concurrentRequests| The number of concurrent requests allowed to be executed. | SW_STORAGE_ES_CONCURRENT_REQUESTS| 2 | | - | - | resultWindowMaxSize | The max size of dataset when OAP loading cache, such as network alias. | SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE | 10000| | - | - | metadataQueryMaxSize | The max size of metadata per query. | SW_STORAGE_ES_QUERY_MAX_SIZE | 5000 | @@ -124,7 +124,7 @@ core|default|role|Option values, `Mixed/Receiver/Aggregator`. **Receiver** mode | - | - | superDatasetIndexShardsFactor | Super data set has been defined in the codes, such as trace segments. This factor provides more shards for the super data set, shards number = indexShardsNumber * superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger traces.|SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR|5 | | - | - | superDatasetIndexReplicasNumber | Represent the replicas number in the super size dataset record index.|SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER|0 | | - | - | bulkActions| Async bulk size of data batch execution. | SW_STORAGE_ES_BULK_ACTIONS| 5000| -| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or not. Unit is second.| SW_STORAGE_ES_FLUSH_INTERVAL | 10| +| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or not. Unit is second. INT(flushInterval * 2/3) would be used for index refresh period.| SW_STORAGE_ES_FLUSH_INTERVAL | 15 (index refresh period = 10)| | - | - | concurrentRequests| The number of concurrent requests allowed to be executed. | SW_STORAGE_ES_CONCURRENT_REQUESTS| 2 | | - | - | resultWindowMaxSize | The max size of dataset when OAP loading cache, such as network alias. | SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE | 10000| | - | - | metadataQueryMaxSize | The max size of metadata per query. | SW_STORAGE_ES_QUERY_MAX_SIZE | 5000 | diff --git a/oap-server/server-bootstrap/src/main/resources/application.yml b/oap-server/server-bootstrap/src/main/resources/application.yml index 06ba88c..735f843 100755 --- a/oap-server/server-bootstrap/src/main/resources/application.yml +++ b/oap-server/server-bootstrap/src/main/resources/application.yml @@ -142,7 +142,9 @@ storage: superDatasetIndexShardsFactor: ${SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR:5} # This factor provides more shards for the super data set, shards number = indexShardsNumber * superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger traces. superDatasetIndexReplicasNumber: ${SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER:0} # Represent the replicas number in the super size dataset record index, the default value is 0. bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} # Execute the async bulk record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests - flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} # flush the bulk every 10 seconds whatever the number of requests + # flush the bulk every 10 seconds whatever the number of requests + # INT(flushInterval * 2/3) would be used for index refresh period. + flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000} metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000} @@ -170,7 +172,9 @@ storage: password: ${SW_ES_PASSWORD:""} secretsManagementFile: ${SW_ES_SECRETS_MANAGEMENT_FILE:""} # Secrets management file in the properties format includes the username, password, which are managed by 3rd party tool. bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} # Execute the async bulk record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests - flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} # flush the bulk every 10 seconds whatever the number of requests + # flush the bulk every 10 seconds whatever the number of requests + # INT(flushInterval * 2/3) would be used for index refresh period. + flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000} metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000} @@ -251,7 +255,9 @@ storage: password: ${SW_ES_PASSWORD:""} secretsManagementFile: ${SW_ES_SECRETS_MANAGEMENT_FILE:""} # Secrets management file in the properties format includes the username, password, which are managed by 3rd party tool. bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} # Execute the async bulk record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests - flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} # flush the bulk every 10 seconds whatever the number of requests + # flush the bulk every 10 seconds whatever the number of requests + # INT(flushInterval * 2/3) would be used for index refresh period. + flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000} metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000} diff --git a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java index 8ca9e0e..62411f8 100644 --- a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java +++ b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java @@ -69,9 +69,12 @@ public class StorageModuleElasticsearchConfig extends ModuleConfig { */ private int bulkActions = 5000; /** - * Period of flush, no matter `bulkActions` reached or not. Unit is second. + * Period of flesh, no matter `bulkActions` reached or not. + * INT(flushInterval * 2/3) would be used for index refresh period. + * Unit is second. * * @since 8.7.0 increase to 15s from 10s + * @since 8.7.0 use INT(flushInterval * 2/3) as ElasticSearch index refresh interval. Default is 10s. */ private int flushInterval = 15; private int concurrentRequests = 2; diff --git a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java index d8eb8cf..7fd9f22 100644 --- a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java +++ b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java @@ -162,7 +162,19 @@ public class StorageEsInstaller extends ModelInstaller { setting.put("index.number_of_shards", model.isSuperDataset() ? config.getIndexShardsNumber() * config.getSuperDatasetIndexShardsFactor() : config.getIndexShardsNumber()); - setting.put("index.refresh_interval", TimeValue.timeValueSeconds(config.getFlushInterval()).toString()); + // Set the index refresh period as INT(flushInterval * 2/3). At the edge case, + // in low traffic(traffic < bulkActions in the whole period), there is a possible case, 2 period bulks are included in + // one index refresh rebuild operation, which could cause version conflicts. And this case can't be fixed + // through `core/persistentPeriod` as the bulk fresh is not controlled by the persistent timer anymore. + int indexRefreshInterval = config.getFlushInterval() * 2 / 3; + if (indexRefreshInterval < 5) { + // The refresh interval should not be less than 5 seconds (the recommended default value = 10s), + // and the bulk flush interval should not be set less than 8s (the recommended default value = 15s). + // This is a precaution case which makes ElasticSearch server has reasonable refresh interval, + // even this value is set too small by end user manually. + indexRefreshInterval = 5; + } + setting.put("index.refresh_interval", TimeValue.timeValueSeconds(indexRefreshInterval).toString()); setting.put("analysis", getAnalyzerSetting(model.getColumns())); if (!StringUtil.isEmpty(config.getAdvanced())) { Map<String, Object> advancedSettings = gson.fromJson(config.getAdvanced(), Map.class);