[skywalking] 01/01: Adjust ElasticSearch index refresh period as INT(flushInterval * 2/3)

wusheng Thu, 15 Jul 2021 17:13:35 -0700

This is an automated email from the ASF dual-hosted git repository.

wusheng pushed a commit to branch polish-refresh
in repository https://gitbox.apache.org/repos/asf/skywalking.git


commit d32b5e7f7c9afd82e7c5f917102c1a183725d11b
Author: Wu Sheng <wu.sh...@foxmail.com>
AuthorDate: Fri Jul 16 08:13:11 2021 +0800

    Adjust ElasticSearch index refresh period as INT(flushInterval * 2/3)
---
 CHANGES.md                                         |   4 +
 .../backend/backend-infrastructure-monitoring.md   | 119 +++++++++++++++++++++
 docs/en/setup/backend/configuration-vocabulary.md  |   4 +-
 .../src/main/resources/application.yml             |  12 ++-
 .../StorageModuleElasticsearchConfig.java          |   5 +-
 .../elasticsearch/base/StorageEsInstaller.java     |  14 ++-
 6 files changed, 151 insertions(+), 7 deletions(-)

diff --git a/CHANGES.md b/CHANGES.md
index 47a03e2..7bbde5e 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -94,6 +94,10 @@ Release Notes.
   default flush period of hour and day level metrics are 25s * 4.
 * Performance: optimize IDs read of ElasticSearch storage options(6 and 7). 
Use the physical index rather than template
   alias name.
+* Adjust index refresh period as INT(flushInterval * 2/3), it used to be as 
same as bulk flush period. At the edge case,
+  in low traffic(traffic < bulkActions in the whole period), there is a 
possible case, 2 period bulks are included in
+  one index refresh rebuild operation, which could cause version conflicts. 
And this case can't be fixed
+  through `core/persistentPeriod` as the bulk fresh is not controlled by the 
persistent timer anymore.
 
 #### UI
 
diff --git a/docs/en/setup/backend/backend-infrastructure-monitoring.md 
b/docs/en/setup/backend/backend-infrastructure-monitoring.md
new file mode 100644
index 0000000..53163aa
--- /dev/null
+++ b/docs/en/setup/backend/backend-infrastructure-monitoring.md
@@ -0,0 +1,119 @@
+# VMs monitoring 
+SkyWalking leverages Prometheus node-exporter for collecting metrics data from 
the VMs, and leverages OpenTelemetry Collector to transfer the metrics to
+[OpenTelemetry receiver](backend-receivers.md#opentelemetry-receiver) and into 
the [Meter System](./../../concepts-and-designs/meter.md).  
+We define the VM entity as a `Service` in OAP, and use `vm::` as a prefix to 
identify it.  
+
+## Data flow
+1. The Prometheus node-exporter collects metrics data from the VMs.
+2. The OpenTelemetry Collector fetches metrics from the node-exporter via 
Prometheus Receiver and pushes metrics to SkyWalking OAP Server via the 
OpenCensus gRPC Exporter.
+3. The SkyWalking OAP Server parses the expression with 
[MAL](../../concepts-and-designs/mal.md) to filter/calculate/aggregate and 
store the results. 
+
+## Setup 
+1. Setup [Prometheus 
node-exporter](https://prometheus.io/docs/guides/node-exporter/).
+2. Setup [OpenTelemetry Collector ](https://opentelemetry.io/docs/collector/). 
This is an example for OpenTelemetry Collector configuration 
[otel-collector-config.yaml](../../../../test/e2e/e2e-test/docker/promOtelVM/otel-collector-config.yaml).
+3. Config SkyWalking [OpenTelemetry 
receiver](backend-receivers.md#opentelemetry-receiver).
+   
+## Supported Metrics
+
+| Monitoring Panel | Unit | Metric Name | Description | Data Source |
+|-----|-----|-----|-----|-----|
+| CPU Usage | % | cpu_total_percentage | The total percentage usage of the CPU 
core. If there are 2 cores, the maximum usage is 200%. | Prometheus 
node-exporter |
+| Memory RAM Usage | MB | meter_vm_memory_used | The total RAM usage | 
Prometheus node-exporter |
+| Memory Swap Usage | % | meter_vm_memory_swap_percentage | The percentage 
usage of swap memory | Prometheus node-exporter |
+| CPU Average Used | % | meter_vm_cpu_average_used | The percentage usage of 
the CPU core in each mode | Prometheus node-exporter |
+| CPU Load |  | meter_vm_cpu_load1<br />meter_vm_cpu_load5<br 
/>meter_vm_cpu_load15 | The CPU 1m / 5m / 15m average load | Prometheus 
node-exporter |
+| Memory RAM | MB | meter_vm_memory_total<br />meter_vm_memory_available<br 
/>meter_vm_memory_used | The RAM statistics, including Total / Available / Used 
| Prometheus node-exporter |
+| Memory Swap | MB | meter_vm_memory_swap_free<br />meter_vm_memory_swap_total 
| The swap memory statistics, including Free / Total | Prometheus node-exporter 
|
+| File System Mountpoint Usage | % | meter_vm_filesystem_percentage | The 
percentage usage of the file system at each mount point | Prometheus 
node-exporter |
+| Disk R/W | KB/s | meter_vm_disk_read,meter_vm_disk_written | The disk read 
and written | Prometheus node-exporter |
+| Network Bandwidth Usage | KB/s | meter_vm_network_receive<br 
/>meter_vm_network_transmit | The network receive and transmit | Prometheus 
node-exporter |
+| Network Status |  | meter_vm_tcp_curr_estab<br />meter_vm_tcp_tw<br 
/>meter_vm_tcp_alloc<br />meter_vm_sockets_used<br />meter_vm_udp_inuse | The 
number of TCPs established / TCP time wait / TCPs allocated / sockets in use / 
UDPs in use | Prometheus node-exporter |
+| Filefd Allocated |  | meter_vm_filefd_allocated | The number of file 
descriptors allocated | Prometheus node-exporter |
+
+## Customizing 
+You can customize your own metrics/expression/dashboard panel.   
+The metrics definition and expression rules are found in 
`/config/otel-oc-rules/vm.yaml`.  
+The dashboard panel confirmations are found in 
`/config/ui-initialized-templates/vm.yml`.
+
+## Blog
+For more details, see blog article [SkyWalking 8.4 provides infrastructure 
monitoring](https://skywalking.apache.org/blog/2021-02-07-infrastructure-monitoring/).
+
+# K8s monitoring 
+SkyWalking leverages K8s kube-state-metrics and cAdvisor for collecting 
metrics data from K8s, and leverages OpenTelemetry Collector to transfer the 
metrics to
+[OpenTelemetry receiver](backend-receivers.md#opentelemetry-receiver) and into 
the [Meter System](./../../concepts-and-designs/meter.md). This feature 
requires authorizing the OAP Server to access K8s's `API Server`.  
+We define the k8s-cluster as a `Service` in the OAP, and use `k8s-cluster::` 
as a prefix to identify it.  
+We define the k8s-node as an `Instance` in the OAP, and set its name as the 
K8s `node name`.  
+We define the k8s-service as an `Endpoint` in the OAP, and set its name as 
`$serviceName.$namespace`.  
+
+## Data flow
+1. K8s kube-state-metrics and cAdvisor collect metrics data from K8s.
+2. OpenTelemetry Collector fetches metrics from kube-state-metrics and 
cAdvisor via Prometheus Receiver and pushes metrics to SkyWalking OAP Server 
via the OpenCensus GRPC Exporter.
+3. The SkyWalking OAP Server access to K8s's `API Server` gets meta info and 
parses the expression with [MAL](../../concepts-and-designs/mal.md) to 
filter/calculate/aggregate and store the results. 
+
+## Setup 
+1. Setup 
[kube-state-metric](https://github.com/kubernetes/kube-state-metrics#kubernetes-deployment).
+2. cAdvisor is integrated into `kubelet` by default.
+3. Set up [OpenTelemetry Collector 
](https://opentelemetry.io/docs/collector/getting-started/#kubernetes). For 
details on Prometheus Receiver in OpenTelemetry Collector for K8s, refer to 
[here](https://github.com/prometheus/prometheus/blob/main/documentation/examples/prometheus-kubernetes.yml).
 For a quick start, we have provided a full example for OpenTelemetry Collector 
configuration [otel-collector-config.yaml](otel-collector-config.yaml).
+4. Config SkyWalking [OpenTelemetry 
receiver](backend-receivers.md#opentelemetry-receiver).
+
+## Supported Metrics
+From the different points of view to monitor K8s, there are 3 kinds of 
metrics: [Cluster](#cluster) / [Node](#node) / [Service](#service) 
+
+### Cluster 
+These metrics are related to the selected cluster (`Current Service in the 
dashboard`).
+
+| Monitoring Panel | Unit | Metric Name | Description | Data Source |
+|-----|-----|-----|-----|-----|
+| Node Total |  | k8s_cluster_node_total | The number of nodes | K8s 
kube-state-metrics|
+| Namespace Total |  | k8s_cluster_namespace_total | The number of namespaces 
| K8s kube-state-metrics|
+| Deployment Total |  | k8s_cluster_deployment_total | The number of 
deployments | K8s kube-state-metrics|
+| Service Total |  | k8s_cluster_service_total | The number of services | K8s 
kube-state-metrics|
+| Pod Total |  | k8s_cluster_pod_total | The number of pods | K8s 
kube-state-metrics|
+| Container Total |  | k8s_cluster_container_total | The number of containers 
| K8s kube-state-metrics|
+| CPU Resources | m | k8s_cluster_cpu_cores<br 
/>k8s_cluster_cpu_cores_requests<br />k8s_cluster_cpu_cores_limits<br 
/>k8s_cluster_cpu_cores_allocatable | The capacity and the Requests / Limits / 
Allocatable of the CPU | K8s kube-state-metrics|
+| Memory Resources | GB | k8s_cluster_memory_total<br 
/>k8s_cluster_memory_requests<br />k8s_cluster_memory_limits<br 
/>k8s_cluster_memory_allocatable | The capacity and the Requests / Limits / 
Allocatable of the memory | K8s kube-state-metrics|
+| Storage Resources | GB | k8s_cluster_storage_total<br 
/>k8s_cluster_storage_allocatable | The capacity and allocatable of the storage 
| K8s kube-state-metrics|
+| Node Status |  | k8s_cluster_node_status | The current status of the nodes | 
K8s kube-state-metrics|
+| Deployment Status |  | k8s_cluster_deployment_status | The current status of 
the deployment | K8s kube-state-metrics|
+| Deployment Spec Replicas |  | k8s_cluster_deployment_spec_replicas | The 
number of desired pods for a deployment | K8s kube-state-metrics|
+| Service Status |  | k8s_cluster_service_pod_status | The services current 
status, depending on the related pods' status | K8s kube-state-metrics|
+| Pod Status Not Running |  | k8s_cluster_pod_status_not_running | The pods 
which are not running in the current phase | K8s kube-state-metrics|
+| Pod Status Waiting |  | k8s_cluster_pod_status_waiting | The pods and 
containers which are currently in the waiting status, with reasons shown | K8s 
kube-state-metrics|
+| Pod Status Terminated |  | k8s_cluster_container_status_terminated | The 
pods and containers which are currently in the terminated status, with reasons 
shown | K8s kube-state-metrics|
+
+### Node
+These metrics are related to the selected node (`Current Instance in the 
dashboard`).
+
+| Monitoring Panel | Unit | Metric Name | Description | Data Source |
+|-----|-----|-----|-----|-----|
+| Pod Total |  | k8s_node_pod_total | The number of pods in this node | K8s 
kube-state-metrics |
+| Node Status |  | k8s_node_node_status | The current status of this node | 
K8s kube-state-metrics |
+| CPU Resources | m | k8s_node_cpu_cores<br 
/>k8s_node_cpu_cores_allocatable<br />k8s_node_cpu_cores_requests<br 
/>k8s_node_cpu_cores_limits |  The capacity and the requests / Limits / 
Allocatable of the CPU  | K8s kube-state-metrics |
+| Memory Resources | GB | k8s_node_memory_total<br 
/>k8s_node_memory_allocatable<br />k8s_node_memory_requests<br 
/>k8s_node_memory_limits | The capacity and the requests / Limits / Allocatable 
of the memory | K8s kube-state-metrics |
+| Storage Resources | GB | k8s_node_storage_total<br 
/>k8s_node_storage_allocatable | The capacity and allocatable of the storage | 
K8s kube-state-metrics |
+| CPU Usage | m | k8s_node_cpu_usage | The total usage of the CPU core, if 
there are 2 cores the maximum usage is 2000m | cAdvisor |
+| Memory Usage | GB | k8s_node_memory_usage | The totaly memory usage | 
cAdvisor |
+| Network I/O| KB/s | k8s_node_network_receive<br />k8s_node_network_transmit 
| The network receive and transmit | cAdvisor |
+
+### Service
+In these metrics, the pods are related to the selected service (`Current 
Endpoint in the dashboard`).
+
+| Monitoring Panel | Unit | Metric Name | Description | Data Source |
+|-----|-----|-----|-----|-----|
+| Service Pod Total |  | k8s_service_pod_total | The number of pods | K8s 
kube-state-metrics |
+| Service Pod Status |  | k8s_service_pod_status | The current status of pods 
| K8s kube-state-metrics |
+| Service CPU Resources | m | k8s_service_cpu_cores_requests<br 
/>k8s_service_cpu_cores_limits | The CPU resources requests / Limits of this 
service | K8s kube-state-metrics |
+| Service Memory Resources | MB | k8s_service_memory_requests<br 
/>k8s_service_memory_limits | The memory resources requests / Limits of this 
service | K8s kube-state-metrics |
+| Pod CPU Usage | m | k8s_service_pod_cpu_usage | The CPU resources total 
usage of pods | cAdvisor |
+| Pod Memory Usage | MB | k8s_service_pod_memory_usage | The memory resources 
total usage of pods | cAdvisor |
+| Pod Waiting |  | k8s_service_pod_status_waiting | The pods and containers 
which are currently in the waiting status, with reasons shown | K8s 
kube-state-metrics |
+| Pod Terminated |  | k8s_service_pod_status_terminated | The pods and 
containers which are currently in the terminated status, with reasons shown | 
K8s kube-state-metrics |
+| Pod Restarts |  | k8s_service_pod_status_restarts_total | The number of per 
container restarts related to the pods | K8s kube-state-metrics |
+| Pod Network Receive | KB/s | k8s_service_pod_network_receive | The network 
receive of the pods | cAdvisor |
+| Pod Network Transmit | KB/s | k8s_service_pod_network_transmit | The network 
transmit of the pods  | cAdvisor |
+| Pod Storage Usage | MB | k8s_service_pod_fs_usage | The storage resources 
total usage of pods related to this service | cAdvisor |
+
+## Customizing 
+You can customize your own metrics/expression/dashboard panel.   
+The metrics definition and expression rules are found in 
`/config/otel-oc-rules/k8s-cluster.yaml，/config/otel-oc-rules/k8s-node.yaml, 
/config/otel-oc-rules/k8s-service.yaml`.  
+The dashboard panel configurations are found in 
`/config/ui-initialized-templates/k8s.yml`.
diff --git a/docs/en/setup/backend/configuration-vocabulary.md 
b/docs/en/setup/backend/configuration-vocabulary.md
index fb79c15..6ddd0fa 100644
--- a/docs/en/setup/backend/configuration-vocabulary.md
+++ b/docs/en/setup/backend/configuration-vocabulary.md
@@ -99,7 +99,7 @@ core|default|role|Option values, `Mixed/Receiver/Aggregator`. 
**Receiver** mode
 | - | - | superDatasetIndexShardsFactor | Super data set has been defined in 
the codes, such as trace segments. This factor provides more shards for the 
super data set, shards number = indexShardsNumber * 
superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger 
traces.|SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR|5 |
 | - | - | superDatasetIndexReplicasNumber | Represent the replicas number in 
the super size dataset record 
index.|SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER|0 |
 | - | - | bulkActions| Async bulk size of the record data batch execution. | 
SW_STORAGE_ES_BULK_ACTIONS| 5000|
-| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or 
not. Unit is second.| SW_STORAGE_ES_FLUSH_INTERVAL | 15|
+| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or 
not. Unit is second. INT(flushInterval * 2/3) would be used for index refresh 
period.| SW_STORAGE_ES_FLUSH_INTERVAL | 15 (index refresh period = 10)|
 | - | - | concurrentRequests| The number of concurrent requests allowed to be 
executed. | SW_STORAGE_ES_CONCURRENT_REQUESTS| 2 |
 | - | - | resultWindowMaxSize | The max size of dataset when OAP loading 
cache, such as network alias. | SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE | 10000|
 | - | - | metadataQueryMaxSize | The max size of metadata per query. | 
SW_STORAGE_ES_QUERY_MAX_SIZE | 5000 |
@@ -124,7 +124,7 @@ core|default|role|Option values, 
`Mixed/Receiver/Aggregator`. **Receiver** mode
 | - | - | superDatasetIndexShardsFactor | Super data set has been defined in 
the codes, such as trace segments. This factor provides more shards for the 
super data set, shards number = indexShardsNumber * 
superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger 
traces.|SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR|5 |
 | - | - | superDatasetIndexReplicasNumber | Represent the replicas number in 
the super size dataset record 
index.|SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER|0 |
 | - | - | bulkActions| Async bulk size of data batch execution. | 
SW_STORAGE_ES_BULK_ACTIONS| 5000|
-| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or 
not. Unit is second.| SW_STORAGE_ES_FLUSH_INTERVAL | 10|
+| - | - | flushInterval| Period of flush, no matter `bulkActions` reached or 
not. Unit is second. INT(flushInterval * 2/3) would be used for index refresh 
period.| SW_STORAGE_ES_FLUSH_INTERVAL | 15 (index refresh period = 10)|
 | - | - | concurrentRequests| The number of concurrent requests allowed to be 
executed. | SW_STORAGE_ES_CONCURRENT_REQUESTS| 2 |
 | - | - | resultWindowMaxSize | The max size of dataset when OAP loading 
cache, such as network alias. | SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE | 10000|
 | - | - | metadataQueryMaxSize | The max size of metadata per query. | 
SW_STORAGE_ES_QUERY_MAX_SIZE | 5000 |
diff --git a/oap-server/server-bootstrap/src/main/resources/application.yml 
b/oap-server/server-bootstrap/src/main/resources/application.yml
index 06ba88c..735f843 100755
--- a/oap-server/server-bootstrap/src/main/resources/application.yml
+++ b/oap-server/server-bootstrap/src/main/resources/application.yml
@@ -142,7 +142,9 @@ storage:
     superDatasetIndexShardsFactor: 
${SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR:5} #  This factor provides 
more shards for the super data set, shards number = indexShardsNumber * 
superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger 
traces.
     superDatasetIndexReplicasNumber: 
${SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER:0} # Represent the replicas 
number in the super size dataset record index, the default value is 0.
     bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} # Execute the async bulk 
record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests
-    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} # flush the bulk every 
10 seconds whatever the number of requests
+    # flush the bulk every 10 seconds whatever the number of requests
+    # INT(flushInterval * 2/3) would be used for index refresh period.
+    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15}
     concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of 
concurrent requests
     resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
     metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
@@ -170,7 +172,9 @@ storage:
     password: ${SW_ES_PASSWORD:""}
     secretsManagementFile: ${SW_ES_SECRETS_MANAGEMENT_FILE:""} # Secrets 
management file in the properties format includes the username, password, which 
are managed by 3rd party tool.
     bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} # Execute the async bulk 
record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests
-    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} # flush the bulk every 
10 seconds whatever the number of requests
+    # flush the bulk every 10 seconds whatever the number of requests
+    # INT(flushInterval * 2/3) would be used for index refresh period.
+    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15}
     concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of 
concurrent requests
     resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
     metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
@@ -251,7 +255,9 @@ storage:
     password: ${SW_ES_PASSWORD:""}
     secretsManagementFile: ${SW_ES_SECRETS_MANAGEMENT_FILE:""} # Secrets 
management file in the properties format includes the username, password, which 
are managed by 3rd party tool.
     bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:5000} # Execute the async bulk 
record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests
-    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15} # flush the bulk every 
10 seconds whatever the number of requests
+    # flush the bulk every 10 seconds whatever the number of requests
+    # INT(flushInterval * 2/3) would be used for index refresh period.
+    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:15}
     concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of 
concurrent requests
     resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
     metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
diff --git 
a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java
 
b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java
index 8ca9e0e..62411f8 100644
--- 
a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java
+++ 
b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/StorageModuleElasticsearchConfig.java
@@ -69,9 +69,12 @@ public class StorageModuleElasticsearchConfig extends 
ModuleConfig {
      */
     private int bulkActions = 5000;
     /**
-     * Period of flush, no matter `bulkActions` reached or not. Unit is second.
+     * Period of flesh, no matter `bulkActions` reached or not.
+     * INT(flushInterval * 2/3) would be used for index refresh period.
+     * Unit is second.
      *
      * @since 8.7.0 increase to 15s from 10s
+     * @since 8.7.0 use INT(flushInterval * 2/3) as ElasticSearch index 
refresh interval. Default is 10s.
      */
     private int flushInterval = 15;
     private int concurrentRequests = 2;
diff --git 
a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java
 
b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java
index d8eb8cf..7fd9f22 100644
--- 
a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java
+++ 
b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/StorageEsInstaller.java
@@ -162,7 +162,19 @@ public class StorageEsInstaller extends ModelInstaller {
         setting.put("index.number_of_shards", model.isSuperDataset()
             ? config.getIndexShardsNumber() * 
config.getSuperDatasetIndexShardsFactor()
             : config.getIndexShardsNumber());
-        setting.put("index.refresh_interval", 
TimeValue.timeValueSeconds(config.getFlushInterval()).toString());
+        // Set the index refresh period as INT(flushInterval * 2/3). At the 
edge case,
+        // in low traffic(traffic < bulkActions in the whole period), there is 
a possible case, 2 period bulks are included in
+        // one index refresh rebuild operation, which could cause version 
conflicts. And this case can't be fixed
+        // through `core/persistentPeriod` as the bulk fresh is not controlled 
by the persistent timer anymore.
+        int indexRefreshInterval = config.getFlushInterval() * 2 / 3;
+        if (indexRefreshInterval < 5) {
+            // The refresh interval should not be less than 5 seconds (the 
recommended default value = 10s),
+            // and the bulk flush interval should not be set less than 8s (the 
recommended default value = 15s).
+            // This is a precaution case which makes ElasticSearch server has 
reasonable refresh interval,
+            // even this value is set too small by end user manually.
+            indexRefreshInterval = 5;
+        }
+        setting.put("index.refresh_interval", 
TimeValue.timeValueSeconds(indexRefreshInterval).toString());
         setting.put("analysis", getAnalyzerSetting(model.getColumns()));
         if (!StringUtil.isEmpty(config.getAdvanced())) {
             Map<String, Object> advancedSettings = 
gson.fromJson(config.getAdvanced(), Map.class);

[skywalking] 01/01: Adjust ElasticSearch index refresh period as INT(flushInterval * 2/3)

Reply via email to