[GitHub] [flink] Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] translate dev/stream/state/checkpointing into Chinese

2019-10-27 Thread GitBox
Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] 
translate dev/stream/state/checkpointing into Chinese
URL: https://github.com/apache/flink/pull/9805#discussion_r339399783
 
 

 ##
 File path: docs/dev/stream/state/checkpointing.zh.md
 ##
 @@ -173,30 +165,26 @@ Some more parameters and/or defaults may be set via 
`conf/flink-conf.yaml` (see
 
 ## Selecting a State Backend
 
 Review comment:
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] translate dev/stream/state/checkpointing into Chinese

2019-10-25 Thread GitBox
Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] 
translate dev/stream/state/checkpointing into Chinese
URL: https://github.com/apache/flink/pull/9805#discussion_r339286584
 
 

 ##
 File path: docs/dev/stream/state/checkpointing.zh.md
 ##
 @@ -173,30 +165,26 @@ Some more parameters and/or defaults may be set via 
`conf/flink-conf.yaml` (see
 
 ## Selecting a State Backend
 
-Flink's [checkpointing mechanism]({{ site.baseurl 
}}/internals/stream_checkpointing.html) stores consistent snapshots
-of all the state in timers and stateful operators, including connectors, 
windows, and any [user-defined state](state.html).
-Where the checkpoints are stored (e.g., JobManager memory, file system, 
database) depends on the configured
-**State Backend**. 
-
-By default, state is kept in memory in the TaskManagers and checkpoints are 
stored in memory in the JobManager. For proper persistence of large state,
-Flink supports various approaches for storing and checkpointing state in other 
state backends. The choice of state backend can be configured via 
`StreamExecutionEnvironment.setStateBackend(…)`.
+Flink 的 [checkpointing mechanism]({{ site.baseurl 
}}/zh/internals/stream_checkpointing.html) 存储在定时器与状态操作里的持久化快照,
+包括连接器(connectors),窗口(windows)以及任何用户[自定义的状态](state.html)
+checkpoint 存储在那里取决于所配置的 **State Backend**(比如 JobManager memory、 file system、 
database)。
 
-See [state backends]({{ site.baseurl }}/ops/state/state_backends.html) for 
more details on the available state backends and options for job-wide and 
cluster-wide configuration.
+默认情况下,状态是保持在 TaskManagers 的内存中,checkpoint 保存在 JobManager 
的内存中。为了体量大的状态的能完全恰当的持久化,
+Flink 支持各种各样的途径去存储,来 checkpoint 状态到其他的 state backends。通过 
`StreamExecutionEnvironment.setStateBackend(…)` 来配置所选的 state backends。
 
+阅读 [state backends]({{ site.baseurl }}/zh/ops/state/state_backends.html) 
来查看在可用 state backends 上的更多细节,选择 job范围 与 集群返回 的配置。
 
-## State Checkpoints in Iterative Jobs
+## 在 Iterative Jobs 中的状态 checkpoint
 
-Flink currently only provides processing guarantees for jobs without 
iterations. Enabling checkpointing on an iterative job causes an exception. In 
order to force checkpointing on an iterative program the user needs to set a 
special flag when enabling checkpointing: `env.enableCheckpointing(interval, 
CheckpointingMode.EXACTLY_ONCE, force = true)`.
+Flink 现在只提供没有 iterations 的 job 的处理保证。在 iterative job 上激活 checkpoint 
会导致异常。为了在迭代程序中强制进行 checkpoint,用于需要在激活 checkpoint 时设置一个特殊的标志: 
`env.enableCheckpointing(interval, CheckpointingMode.EXACTLY_ONCE, force = 
true)`。
 
-Please note that records in flight in the loop edges (and the state changes 
associated with them) will be lost during failure.
+请注意在环形边上游走的记录(以及与之相关的状态变化)在故障时会丢失。
 
 {% top %}
 
+## 重启策略
 
-## Restart Strategies
-
-Flink supports different restart strategies which control how the jobs are 
restarted in case of a failure. For more 
-information, see [Restart Strategies]({{ site.baseurl 
}}/dev/restart_strategies.html).
+Flink 支持不同的重启策略,来控制 job 万一故障时该如何重启。更多信息请阅读 [Restart Strategies]({{ 
site.baseurl }}/zh/dev/restart_strategies.html)。
 
 Review comment:
   看了下会跳转。


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] translate dev/stream/state/checkpointing into Chinese

2019-10-25 Thread GitBox
Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] 
translate dev/stream/state/checkpointing into Chinese
URL: https://github.com/apache/flink/pull/9805#discussion_r339286218
 
 

 ##
 File path: docs/dev/stream/state/checkpointing.zh.md
 ##
 @@ -25,146 +25,138 @@ under the License.
 * ToC
 {:toc}
 
-Every function and operator in Flink can be **stateful** (see [working with 
state](state.html) for details).
-Stateful functions store data across the processing of individual 
elements/events, making state a critical building block for
-any type of more elaborate operation.
+Flink 中的每个方法或算子都能够是**有状态的**(阅读 [working with state](state.html) 查看详细)。
+状态化的方法在处理单个 元素/事件 的时候存储数据,让状态成为使各个类型的算子更加精细的重要部分。
+为了让状态容错,Flink 需要为状态添加**Checkpoint(检查点)**。Checkpoint 使得 Flink 
能够恢复状态和在流中的位置,从而向应用提供和无故障执行时一样的语义。
 
-In order to make state fault tolerant, Flink needs to **checkpoint** the 
state. Checkpoints allow Flink to recover state and positions
-in the streams to give the application the same semantics as a failure-free 
execution.
+[Documentation on streaming fault tolerance]({{ site.baseurl 
}}/zh/internals/stream_checkpointing.html) 介绍了 Flink 流计算容错机制内部的技术原理。
 
-The [documentation on streaming fault tolerance]({{ site.baseurl 
}}/internals/stream_checkpointing.html) describes in detail the technique 
behind Flink's streaming fault tolerance mechanism.
 
+## 前提条件
 
-## Prerequisites
+Flink 的 Checkpoint 机制会和持久化存储进行交互,交换流与状态。一般需要:
 
-Flink's checkpointing mechanism interacts with durable storage for streams and 
state. In general, it requires:
+  - 一个能够回放一段时间内数据的持久化数据源,例如持久化消息队列(例如 Apache Kafka、RabbitMQ、 Amazon Kinesis、 
Google PubSub 等)或文件系统(例如 HDFS、 S3、 GFS、 NFS、 Ceph 等)。
+  - 存放状态的持久化存储,通常为分布式文件系统(比如 HDFS、 S3、 GFS、 NFS、 Ceph 等)。
 
-  - A *persistent* (or *durable*) data source that can replay records for a 
certain amount of time. Examples for such sources are persistent messages 
queues (e.g., Apache Kafka, RabbitMQ, Amazon Kinesis, Google PubSub) or file 
systems (e.g., HDFS, S3, GFS, NFS, Ceph, ...).
-  - A persistent storage for state, typically a distributed filesystem (e.g., 
HDFS, S3, GFS, NFS, Ceph, ...)
+## 激活与配置 Checkpoint
 
+默认情况下,Checkpoint 是禁用的。通过调用 `StreamExecutionEnvironment` 的 
`enableCheckpointing(n)` 来激活 Checkpoint,里面的 *n* 是进行 Checkpoint 的间隔,单位毫秒。
 
-## Enabling and Configuring Checkpointing
+Checkpoint 其他的属性包括:
 
-By default, checkpointing is disabled. To enable checkpointing, call 
`enableCheckpointing(n)` on the `StreamExecutionEnvironment`, where *n* is the 
checkpoint interval in milliseconds.
-
-Other parameters for checkpointing include:
-
-  - *exactly-once vs. at-least-once*: You can optionally pass a mode to the 
`enableCheckpointing(n)` method to choose between the two guarantee levels.
-Exactly-once is preferable for most applications. At-least-once may be 
relevant for certain super-low-latency (consistently few milliseconds) 
applications.
-
-  - *checkpoint timeout*: The time after which a checkpoint-in-progress is 
aborted, if it did not complete by then.
-
-  - *minimum time between checkpoints*: To make sure that the streaming 
application makes a certain amount of progress between checkpoints,
-one can define how much time needs to pass between checkpoints. If this 
value is set for example to *5000*, the next checkpoint will be
-started no sooner than 5 seconds after the previous checkpoint completed, 
regardless of the checkpoint duration and the checkpoint interval.
-Note that this implies that the checkpoint interval will never be smaller 
than this parameter.
+  - *精确一次(exactly-once) 对比 至少一次(at-least-once)*:你可以选择向 
`enableCheckpointing(n)` 方法中传入一个模式来选择使用两种保证等级中的哪一种。
+对于大多数应用来说,精确一次是较好的选择。至少一次可能与某些延迟超低(始终只有几毫秒)的应用的关联较大。
+  
+  - *Checkpoint超时(checkpoint timeout)*:如果过了这个时间,还在进行中的 checkpoint 操作就会被抛弃。
+  
+  - *checkpoint 之间的最小时间(minimum time between checkpoints)*: 为了确保流应用在 
checkpoint 之间有足够的进展,可以定义在 checkpoint 之间需要多久的时间。如果值设置为了 *5000*,
+无论 checkpoint 持续时间与间隔是多久,在前一个 checkpoint 完成的五秒后才会开始下一个 checkpoint。
 
-It is often easier to configure applications by defining the "time between 
checkpoints" than the checkpoint interval, because the "time between 
checkpoints"
-is not susceptible to the fact that checkpoints may sometimes take longer 
than on average (for example if the target storage system is temporarily slow).
-
-Note that this value also implies that the number of concurrent 
checkpoints is *one*.
-
-  - *number of concurrent checkpoints*: By default, the system will not 
trigger another checkpoint while one is still in progress.
-This ensures that the topology does not spend too much time on checkpoints 
and not make progress with processing the streams.
-It is possible to allow for multiple overlapping checkpoints, which is 
interesting for pipelines that have a certain processing delay
-(for example because the functions call external services that need some 
time to respond) but that 

[GitHub] [flink] Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] translate dev/stream/state/checkpointing into Chinese

2019-10-11 Thread GitBox
Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] 
translate dev/stream/state/checkpointing into Chinese
URL: https://github.com/apache/flink/pull/9805#discussion_r334036831
 
 

 ##
 File path: docs/dev/stream/state/checkpointing.zh.md
 ##
 @@ -25,146 +25,138 @@ under the License.
 * ToC
 {:toc}
 
-Every function and operator in Flink can be **stateful** (see [working with 
state](state.html) for details).
-Stateful functions store data across the processing of individual 
elements/events, making state a critical building block for
-any type of more elaborate operation.
+Flink 中的每个方法或算子都能够是**有状态的**(阅读 [working with state](state.html) 查看详细)。
+状态化的方法在处理单个 元素/事件 的时候存储数据,让状态成为使各个类型的算子更加精细的重要部分。
+为了让状态容错,Flink 需要为状态添加**Checkpoint(检查点)**。Checkpoint 使得 Flink 
能够恢复状态和在流中的位置,从而向应用提供和无故障执行时一样的语义。
 
-In order to make state fault tolerant, Flink needs to **checkpoint** the 
state. Checkpoints allow Flink to recover state and positions
-in the streams to give the application the same semantics as a failure-free 
execution.
+[Documentation on streaming fault tolerance]({{ site.baseurl 
}}/zh/internals/stream_checkpointing.html) 介绍了 Flink 流计算容错机制的内部技术原理。
 
-The [documentation on streaming fault tolerance]({{ site.baseurl 
}}/internals/stream_checkpointing.html) describes in detail the technique 
behind Flink's streaming fault tolerance mechanism.
 
+## 前提条件
 
-## Prerequisites
+Flink 的 Checkpoint 机制会和持久化存储进行交互,交换流与状态。一般需要:
 
-Flink's checkpointing mechanism interacts with durable storage for streams and 
state. In general, it requires:
+  - 一个能够回放一段时间内数据的持久化数据源,例如持久化消息队列(例如 Apache Kafka、RabbitMQ、 Amazon Kinesis、 
Google PubSub 等)或文件系统(例如 HDFS、 S3、 GFS、 NFS、 Ceph 等)。
+  - 存放状态的持久化存储,通常为分布式文件系统(比如 HDFS、 S3、 GFS、 NFS、 Ceph 等)。
 
-  - A *persistent* (or *durable*) data source that can replay records for a 
certain amount of time. Examples for such sources are persistent messages 
queues (e.g., Apache Kafka, RabbitMQ, Amazon Kinesis, Google PubSub) or file 
systems (e.g., HDFS, S3, GFS, NFS, Ceph, ...).
-  - A persistent storage for state, typically a distributed filesystem (e.g., 
HDFS, S3, GFS, NFS, Ceph, ...)
+## 激活与配置 Checkpoint
 
+默认情况下,Checkpoint 是禁用的。通过调用 `StreamExecutionEnvironment` 的 
`enableCheckpointing(n)` 来激活 Checkpoint,里面的 *n* 是进行 Checkpoint 的间隔,单位毫秒。
 
-## Enabling and Configuring Checkpointing
+Checkpoint 其他的属性包括:
 
-By default, checkpointing is disabled. To enable checkpointing, call 
`enableCheckpointing(n)` on the `StreamExecutionEnvironment`, where *n* is the 
checkpoint interval in milliseconds.
-
-Other parameters for checkpointing include:
-
-  - *exactly-once vs. at-least-once*: You can optionally pass a mode to the 
`enableCheckpointing(n)` method to choose between the two guarantee levels.
-Exactly-once is preferable for most applications. At-least-once may be 
relevant for certain super-low-latency (consistently few milliseconds) 
applications.
-
-  - *checkpoint timeout*: The time after which a checkpoint-in-progress is 
aborted, if it did not complete by then.
-
-  - *minimum time between checkpoints*: To make sure that the streaming 
application makes a certain amount of progress between checkpoints,
-one can define how much time needs to pass between checkpoints. If this 
value is set for example to *5000*, the next checkpoint will be
-started no sooner than 5 seconds after the previous checkpoint completed, 
regardless of the checkpoint duration and the checkpoint interval.
-Note that this implies that the checkpoint interval will never be smaller 
than this parameter.
+  - *精确一次(exactly-once) 对比 至少一次(at-least-once)*:你可以选择向 
`enableCheckpointing(n)` 方法中传入一个模式来选择使用两种保证等级中的哪一种。
+对于大多数应用来说,精确一次是较好的选择。至少一次可能与某些延迟超低(始终只有几毫秒)的应用的关联较大。
+  
+  - *Checkpoint超时(checkpoint timeout)*:如果过了这个时间,还在进行中的 checkpoint 操作就会被抛弃。
+  
+  - *checkpoint 之间的最小时间(minimum time between checkpoints)*: 为了确保流应用在 
checkpoint 之间有足够的进展,可以定义在 checkpoint 之间需要多久的时间。如果值设置为了 *5000*,
+无论 checkpoint 持续时间与间隔是多久,在前一个 checkpoint 完成的五秒后才会开始下一个 checkpoint。
 
-It is often easier to configure applications by defining the "time between 
checkpoints" than the checkpoint interval, because the "time between 
checkpoints"
-is not susceptible to the fact that checkpoints may sometimes take longer 
than on average (for example if the target storage system is temporarily slow).
-
-Note that this value also implies that the number of concurrent 
checkpoints is *one*.
-
-  - *number of concurrent checkpoints*: By default, the system will not 
trigger another checkpoint while one is still in progress.
-This ensures that the topology does not spend too much time on checkpoints 
and not make progress with processing the streams.
-It is possible to allow for multiple overlapping checkpoints, which is 
interesting for pipelines that have a certain processing delay
-(for example because the functions call external services that need some 
time to respond) but that 

[GitHub] [flink] Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] translate dev/stream/state/checkpointing into Chinese

2019-10-11 Thread GitBox
Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] 
translate dev/stream/state/checkpointing into Chinese
URL: https://github.com/apache/flink/pull/9805#discussion_r334035180
 
 

 ##
 File path: docs/dev/stream/state/checkpointing.zh.md
 ##
 @@ -25,146 +25,138 @@ under the License.
 * ToC
 {:toc}
 
-Every function and operator in Flink can be **stateful** (see [working with 
state](state.html) for details).
-Stateful functions store data across the processing of individual 
elements/events, making state a critical building block for
-any type of more elaborate operation.
+Flink 中的每个方法或算子都能够是**有状态的**(阅读 [working with state](state.html) 查看详细)。
+状态化的方法在处理单个 元素/事件 的时候存储数据,让状态成为使各个类型的算子更加精细的重要部分。
+为了让状态容错,Flink 需要为状态添加**Checkpoint(检查点)**。Checkpoint 使得 Flink 
能够恢复状态和在流中的位置,从而向应用提供和无故障执行时一样的语义。
 
-In order to make state fault tolerant, Flink needs to **checkpoint** the 
state. Checkpoints allow Flink to recover state and positions
-in the streams to give the application the same semantics as a failure-free 
execution.
+[Documentation on streaming fault tolerance]({{ site.baseurl 
}}/zh/internals/stream_checkpointing.html) 介绍了 Flink 流计算容错机制的内部技术原理。
 
-The [documentation on streaming fault tolerance]({{ site.baseurl 
}}/internals/stream_checkpointing.html) describes in detail the technique 
behind Flink's streaming fault tolerance mechanism.
 
+## 前提条件
 
-## Prerequisites
+Flink 的 Checkpoint 机制会和持久化存储进行交互,交换流与状态。一般需要:
 
-Flink's checkpointing mechanism interacts with durable storage for streams and 
state. In general, it requires:
+  - 一个能够回放一段时间内数据的持久化数据源,例如持久化消息队列(例如 Apache Kafka、RabbitMQ、 Amazon Kinesis、 
Google PubSub 等)或文件系统(例如 HDFS、 S3、 GFS、 NFS、 Ceph 等)。
+  - 存放状态的持久化存储,通常为分布式文件系统(比如 HDFS、 S3、 GFS、 NFS、 Ceph 等)。
 
-  - A *persistent* (or *durable*) data source that can replay records for a 
certain amount of time. Examples for such sources are persistent messages 
queues (e.g., Apache Kafka, RabbitMQ, Amazon Kinesis, Google PubSub) or file 
systems (e.g., HDFS, S3, GFS, NFS, Ceph, ...).
-  - A persistent storage for state, typically a distributed filesystem (e.g., 
HDFS, S3, GFS, NFS, Ceph, ...)
+## 激活与配置 Checkpoint
 
+默认情况下,Checkpoint 是禁用的。通过调用 `StreamExecutionEnvironment` 的 
`enableCheckpointing(n)` 来激活 Checkpoint,里面的 *n* 是进行 Checkpoint 的间隔,单位毫秒。
 
-## Enabling and Configuring Checkpointing
+Checkpoint 其他的属性包括:
 
-By default, checkpointing is disabled. To enable checkpointing, call 
`enableCheckpointing(n)` on the `StreamExecutionEnvironment`, where *n* is the 
checkpoint interval in milliseconds.
-
-Other parameters for checkpointing include:
-
-  - *exactly-once vs. at-least-once*: You can optionally pass a mode to the 
`enableCheckpointing(n)` method to choose between the two guarantee levels.
-Exactly-once is preferable for most applications. At-least-once may be 
relevant for certain super-low-latency (consistently few milliseconds) 
applications.
-
-  - *checkpoint timeout*: The time after which a checkpoint-in-progress is 
aborted, if it did not complete by then.
-
-  - *minimum time between checkpoints*: To make sure that the streaming 
application makes a certain amount of progress between checkpoints,
-one can define how much time needs to pass between checkpoints. If this 
value is set for example to *5000*, the next checkpoint will be
-started no sooner than 5 seconds after the previous checkpoint completed, 
regardless of the checkpoint duration and the checkpoint interval.
-Note that this implies that the checkpoint interval will never be smaller 
than this parameter.
+  - *精确一次(exactly-once) 对比 至少一次(at-least-once)*:你可以选择向 
`enableCheckpointing(n)` 方法中传入一个模式来选择使用两种保证等级中的哪一种。
+对于大多数应用来说,精确一次是较好的选择。至少一次可能与某些延迟超低(始终只有几毫秒)的应用的关联较大。
+  
+  - *Checkpoint超时(checkpoint timeout)*:如果过了这个时间,还在进行中的 checkpoint 操作就会被抛弃。
+  
+  - *checkpoint 之间的最小时间(minimum time between checkpoints)*: 为了确保流应用在 
checkpoint 之间有足够的进展,可以定义在 checkpoint 之间需要多久的时间。如果值设置为了 *5000*,
+无论 checkpoint 持续时间与间隔是多久,在前一个 checkpoint 完成的五秒后才会开始下一个 checkpoint。
 
-It is often easier to configure applications by defining the "time between 
checkpoints" than the checkpoint interval, because the "time between 
checkpoints"
-is not susceptible to the fact that checkpoints may sometimes take longer 
than on average (for example if the target storage system is temporarily slow).
-
-Note that this value also implies that the number of concurrent 
checkpoints is *one*.
-
-  - *number of concurrent checkpoints*: By default, the system will not 
trigger another checkpoint while one is still in progress.
-This ensures that the topology does not spend too much time on checkpoints 
and not make progress with processing the streams.
-It is possible to allow for multiple overlapping checkpoints, which is 
interesting for pipelines that have a certain processing delay
-(for example because the functions call external services that need some 
time to respond) but that 

[GitHub] [flink] Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] translate dev/stream/state/checkpointing into Chinese

2019-10-11 Thread GitBox
Henvealf commented on a change in pull request #9805: [FLINK-11635][docs-zh] 
translate dev/stream/state/checkpointing into Chinese
URL: https://github.com/apache/flink/pull/9805#discussion_r334032024
 
 

 ##
 File path: docs/dev/stream/state/checkpointing.zh.md
 ##
 @@ -25,146 +25,138 @@ under the License.
 * ToC
 {:toc}
 
-Every function and operator in Flink can be **stateful** (see [working with 
state](state.html) for details).
-Stateful functions store data across the processing of individual 
elements/events, making state a critical building block for
-any type of more elaborate operation.
+Flink 中的每个方法或算子都能够是**有状态的**(阅读 [working with state](state.html) 查看详细)。
+状态化的方法在处理单个 元素/事件 的时候存储数据,让状态成为使各个类型的算子更加精细的重要部分。
+为了让状态容错,Flink 需要为状态添加**Checkpoint(检查点)**。Checkpoint 使得 Flink 
能够恢复状态和在流中的位置,从而向应用提供和无故障执行时一样的语义。
 
-In order to make state fault tolerant, Flink needs to **checkpoint** the 
state. Checkpoints allow Flink to recover state and positions
-in the streams to give the application the same semantics as a failure-free 
execution.
+[Documentation on streaming fault tolerance]({{ site.baseurl 
}}/zh/internals/stream_checkpointing.html) 介绍了 Flink 流计算容错机制的内部技术原理。
 
 Review comment:
   好


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services