[skywalking] branch master updated: Adjust ElasticSearch index refresh period as INT(flushInterval * 2/3) (#7310)

wusheng Thu, 15 Jul 2021 21:10:53 -0700

This is an automated email from the ASF dual-hosted git repository.

wusheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/skywalking.git



The following commit(s) were added to refs/heads/master by this push:
     new 8df362b  Adjust ElasticSearch index refresh period as 
INT(flushInterval * 2/3) (#7310)
8df362b is described below

commit 8df362b92934e44557da35254aa04317e76ea7c9
Author: 吴晟 Wu Sheng <wu.sh...@foxmail.com>
AuthorDate: Fri Jul 16 12:10:35 2021 +0800

    Adjust ElasticSearch index refresh period as INT(flushInterval * 2/3) 
(#7310)
    
    Adjust index refresh period as INT(flushInterval * 2/3), it used to be as 
same as bulk flush period. At the edge case, in low traffic(traffic < 
bulkActions in the whole period), there is a possible case, 2 period bulks are 
included in one index refresh rebuild operation, which could cause version 
conflicts. And this case can't be fixed through core/persistentPeriod as the 
bulk fresh is not controlled by the persistent timer anymore.
    
    This PR should avoid the following exception in the low load case, 
especially when bulkActions is set larger than the number of a metric type.
---
 CHANGES.md                                                 |  4 ++++
 docs/en/setup/backend/configuration-vocabulary.md          |  4 ++--
 .../server-bootstrap/src/main/resources/application.yml    | 12 +++++++++---
 .../elasticsearch/StorageModuleElasticsearchConfig.java    |  5 ++++-
 .../plugin/elasticsearch/base/StorageEsInstaller.java      | 14 +++++++++++++-
 5 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/CHANGES.md b/CHANGES.md
index 47a03e2..7bbde5e 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -94,6 +94,10 @@ Release Notes.
   default flush period of hour and day level metrics are 25s * 4.
 * Performance: optimize IDs read of ElasticSearch storage options(6 and 7). 
Use the physical index rather than template
   alias name.
+* Adjust index refresh period as INT(flushInterval * 2/3), it used to be as 
same as bulk flush period. At the edge case,
+  in low traffic(traffic < bulkActions in the whole period), there is a 
possible case, 2 period bulks are included in
+  one index refresh rebuild operation, which could cause version conflicts. 
And this case can't be fixed
+  through `core/persistentPeriod` as the bulk fresh is not controlled by the 
persistent timer anymore.
 
 #### UI
 
diff --git a/docs/en/setup/backend/configuration-vocabulary.md 
b/docs/en/setup/backend/configuration-vocabulary.md
index fb79c15..6ddd0fa 100644
--- a/docs/en/setup/backend/configuration-vocabulary.md
+++ b/docs/en/setup/backend/configuration-vocabulary.md
@@ -99,7 +99,7 @@ core|default|role|Option values, `Mixed/Receiver/Aggregator`. 
**Receiver** mode
 | - | - | superDatasetIndexShardsFactor | Super data set has been defined in 
the codes, such as trace segments. This factor provides more shards for the 
super data set, shards number = indexShardsNumber * 
superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger 
traces.|SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR|5 |
 | - | - | superDatasetIndexReplicasNumber | Represent the replicas number in 
the super size dataset record 
index.|SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER|0 |
 | - | - | bulkActions| Async bulk size of the record data batch execution. | 
SW_STORAGE_ES_BULK_ACTIONS| 5000|
-| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or 
not. Unit is second.| SW_STORAGE_ES_FLUSH_INTERVAL | 15|
+| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or 
not. Unit is second. INT(flushInterval * 2/3) would be used for index refresh 
period.| SW_STORAGE_ES_FLUSH_INTERVAL | 15 (index refresh period = 10)|
 | - | - | concurrentRequests| The number of concurrent requests allowed to be 
executed. | SW_STORAGE_ES_CONCURRENT_REQUESTS| 2 |
 | - | - | resultWindowMaxSize | The max size of dataset when OAP loading 
cache, such as network alias. | SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE | 10000|
 | - | - | metadataQueryMaxSize | The max size of metadata per query. | 
SW_STORAGE_ES_QUERY_MAX_SIZE | 5000 |
@@ -124,7 +124,7 @@ core|default|role|Option values, 
`Mixed/Receiver/Aggregator`. **Receiver** mode
 | - | - | superDatasetIndexShardsFactor | Super data set has been defined in 
the codes, such as trace segments. This factor provides more shards for the 
super data set, shards number = indexShardsNumber * 
superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger 
traces.|SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR|5 |
 | - | - | superDatasetIndexReplicasNumber | Represent the replicas number in 
the super size dataset record 
index.|SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER|0 |
 | - | - | bulkActions| Async bulk size of data batch execution. | 
SW_STORAGE_ES_BULK_ACTIONS| 5000|
-| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or 
not. Unit is second.| SW_STORAGE_ES_FLUSH_INTERVAL | 10|
+| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or 
not. Unit is second. INT(flushInterval * 2/3) would be used for index refresh 
period.| SW_STORAGE_ES_FLUSH_INTERVAL | 15 (index refresh period = 10)|
 | - | - | concurrentRequests| The number of concurrent requests allowed to be 
executed. | SW_STORAGE_ES_CONCURRENT_REQUESTS| 2 |
 | - | - | resultWindowMaxSize | The max size of dataset when OAP loading 
cache, such as network alias. | SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE | 10000|
 | - | - | metadataQueryMaxSize | The max size of metadata per query. | 
SW_STORAGE_ES_QUERY_MAX_SIZE | 5000 |
diff --git a/oap-server/server-bootstrap/src/main/resources/application.yml 
b/oap-server/server-bootstrap/src/main/resources/application.yml
index 06ba88c..735f843 100755
--- a/oap-server/server-bootstrap/src/main/resources/application.yml
+++ b/oap-server/server-bootstrap/src/main/resources/application.yml
@@ -142,7 +142,9 @@ storage:
     superDatasetIndexShardsFactor: 
${SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR:5} #  This factor provides 
more shards for the super data set, shards number = indexShardsNumber * 
superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger 
traces.
     superDatasetIndexReplicasNumber: 
${SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER:0} # Represent the replicas 
number in the super size dataset record index, the default value is 0.
     bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} # Execute the async bulk 
record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests
-    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} # flush the bulk every 
10 seconds whatever the number of requests
+    # flush the bulk every 10 seconds whatever the number of requests
+    # INT(flushInterval * 2/3) would be used for index refresh period.
+    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15}
     concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of 
concurrent requests
     resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
     metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
@@ -170,7 +172,9 @@ storage:
     password: ${SW_ES_PASSWORD:""}
     secretsManagementFile: ${SW_ES_SECRETS_MANAGEMENT_FILE:""} # Secrets 
management file in the properties format includes the username, password, which 
are managed by 3rd party tool.
     bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} # Execute the async bulk 
record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests
-    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} # flush the bulk every 
10 seconds whatever the number of requests
+    # flush the bulk every 10 seconds whatever the number of requests
+    # INT(flushInterval * 2/3) would be used for index refresh period.
+    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15}
     concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of 
concurrent requests
     resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
     metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
@@ -251,7 +255,9 @@ storage:
     password: ${SW_ES_PASSWORD:""}
     secretsManagementFile: ${SW_ES_SECRETS_MANAGEMENT_FILE:""} # Secrets 
management file in the properties format includes the username, password, which 
are managed by 3rd party tool.
     bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} # Execute the async bulk 
record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests
-    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} # flush the bulk every 
10 seconds whatever the number of requests
+    # flush the bulk every 10 seconds whatever the number of requests
+    # INT(flushInterval * 2/3) would be used for index refresh period.
+    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15}
     concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of 
concurrent requests
     resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
     metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
diff --git 
a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java
 
b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java
index 8ca9e0e..62411f8 100644
--- 
a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java
+++ 
b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java
@@ -69,9 +69,12 @@ public class StorageModuleElasticsearchConfig extends 
ModuleConfig {
      */
     private int bulkActions = 5000;
     /**
-     * Period of flush, no matter `bulkActions` reached or not. Unit is second.
+     * Period of flesh, no matter `bulkActions` reached or not.
+     * INT(flushInterval * 2/3) would be used for index refresh period.
+     * Unit is second.
      *
      * @since 8.7.0 increase to 15s from 10s
+     * @since 8.7.0 use INT(flushInterval * 2/3) as ElasticSearch index 
refresh interval. Default is 10s.
      */
     private int flushInterval = 15;
     private int concurrentRequests = 2;
diff --git 
a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java
 
b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java
index d8eb8cf..7fd9f22 100644
--- 
a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java
+++ 
b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java
@@ -162,7 +162,19 @@ public class StorageEsInstaller extends ModelInstaller {
         setting.put("index.number_of_shards", model.isSuperDataset()
             ? config.getIndexShardsNumber() * 
config.getSuperDatasetIndexShardsFactor()
             : config.getIndexShardsNumber());
-        setting.put("index.refresh_interval", 
TimeValue.timeValueSeconds(config.getFlushInterval()).toString());
+        // Set the index refresh period as INT(flushInterval * 2/3). At the 
edge case,
+        // in low traffic(traffic < bulkActions in the whole period), there is 
a possible case, 2 period bulks are included in
+        // one index refresh rebuild operation, which could cause version 
conflicts. And this case can't be fixed
+        // through `core/persistentPeriod` as the bulk fresh is not controlled 
by the persistent timer anymore.
+        int indexRefreshInterval = config.getFlushInterval() * 2 / 3;
+        if (indexRefreshInterval < 5) {
+            // The refresh interval should not be less than 5 seconds (the 
recommended default value = 10s),
+            // and the bulk flush interval should not be set less than 8s (the 
recommended default value = 15s).
+            // This is a precaution case which makes ElasticSearch server has 
reasonable refresh interval,
+            // even this value is set too small by end user manually.
+            indexRefreshInterval = 5;
+        }
+        setting.put("index.refresh_interval", 
TimeValue.timeValueSeconds(indexRefreshInterval).toString());
         setting.put("analysis", getAnalyzerSetting(model.getColumns()));
         if (!StringUtil.isEmpty(config.getAdvanced())) {
             Map<String, Object> advancedSettings = 
gson.fromJson(config.getAdvanced(), Map.class);

[skywalking] branch master updated: Adjust ElasticSearch index refresh period as INT(flushInterval * 2/3) (#7310)

Reply via email to