flink git commit: [doc] fixed typos in "Internals -> Fault Tolerance for Data Streaming"

mjsax Thu, 01 Oct 2015 05:09:31 -0700

Repository: flink
Updated Branches:
  refs/heads/master 846ad7064 -> bbd97354b



[doc] fixed typos in "Internals -> Fault Tolerance for Data Streaming"


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/bbd97354
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/bbd97354
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/bbd97354

Branch: refs/heads/master
Commit: bbd97354b8e681dd68f8ad7528eef433227c5c89
Parents: 846ad70
Author: mjsax <mj...@apache.org>
Authored: Thu Oct 1 14:06:49 2015 +0200
Committer: mjsax <mj...@apache.org>
Committed: Thu Oct 1 14:08:08 2015 +0200

----------------------------------------------------------------------
 docs/internals/stream_checkpointing.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/bbd97354/docs/internals/stream_checkpointing.md
----------------------------------------------------------------------
diff --git a/docs/internals/stream_checkpointing.md 
b/docs/internals/stream_checkpointing.md
index 27eae6b..1c8f74f 100644
--- a/docs/internals/stream_checkpointing.md
+++ b/docs/internals/stream_checkpointing.md
@@ -20,7 +20,7 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-This document describes Flink' fault tolerance mechanism for streaming data 
flows.
+This document describes Flink's fault tolerance mechanism for streaming data 
flows.
 
 * This will be replaced by the TOC
 {:toc}
@@ -87,9 +87,9 @@ their descendant records) have passed through the entire data 
flow topology.
   <img src="{{ site.baseurl }}/internals/fig/stream_aligning.svg" 
alt="Aligning data streams at operators with multiple inputs" 
style="width:100%; padding-top:10px; padding-bottom:10px;" />
 </div>
 
-Operators that receive more than one input stream need to *align* the input 
streams on the snapshot barriers. The figure above illutrates this:
+Operators that receive more than one input stream need to *align* the input 
streams on the snapshot barriers. The figure above illustrates this:
 
-  - As soon as the operator received snapshot barrier *n* from an incoming 
stream, it cannot process any further records from that stream until it has 
received the
+  - As soon as the operator received snapshot barrier *n* from an incoming 
stream, it cannot process any further records from that stream until it has 
received
 the barrier *n* from the other inputs as well. Otherwise, it would have mixed 
records that belong to snapshot *n* and with records that belong to snapshot 
*n+1*.
   - Streams that report barrier *n* are temporarily set aside. Records that 
are received from these streams are not processed, but put into an input buffer.
   - Once the last stream has received barrier *n*, the operator emits all 
pending outgoing records, and then emits snapshot *n* barriers itself.
@@ -103,7 +103,7 @@ When operators contain any form of *state*, this state must 
be part of the snaps
   - *User-defined state*: This is state that is created and modified directly 
by the transformation functions (like `map()` or `filter()`). User-defined 
state can either be a simple variable in the function's java object, or the 
associated key/value state of a function (see [State in Streaming 
Applications]({{ site.baseurl 
}}/apis/streaming_guide.html#stateful-computation) for details).
   - *System state*: This state refers to data buffers that are part of the 
operator's computation. A typical example for this state are the *window 
buffers*, inside which the system collects (and aggregates) records for windows 
until the window is evaluated and evicted.
 
-Operators snapshot their state at the point in time when they received all 
snapshot barriers from their input streams, before emitting the barriers to 
their output streams. At that point, all updates to the state from records 
before the barriers will have been made, and no updates that depend on records 
from after the barriers have been applied. Because the state of a snapshot may 
be potentially large, it is stored in a configurable *state backend*. By 
default, this is the JobManager's memory, but for serious setups, a distributed 
reliable storage should be configured (such as HDFS). After the state has been 
stored, the operator acknowledges the checkpoint, emity the snapshot barrier 
into the output streams, and proceeds.
+Operators snapshot their state at the point in time when they received all 
snapshot barriers from their input streams, before emitting the barriers to 
their output streams. At that point, all updates to the state from records 
before the barriers will have been made, and no updates that depend on records 
from after the barriers have been applied. Because the state of a snapshot may 
be potentially large, it is stored in a configurable *state backend*. By 
default, this is the JobManager's memory, but for serious setups, a distributed 
reliable storage should be configured (such as HDFS). After the state has been 
stored, the operator acknowledges the checkpoint, emits the snapshot barrier 
into the output streams, and proceeds.
 
 The resulting snapshot now contains:
 
@@ -118,16 +118,16 @@ The resulting snapshot now contains:
 ### Exactly Once vs. At Least Once
 
 The alignment step may add latency to the streaming program. Usually, this 
extra latency is in the order of a few milliseconds, but we have seen cases 
where the latency
-of some outliers increased noticeably. For applications that require 
consistenty super low latencies (few milliseconds) for all records, Flink has a 
switch to skip the 
+of some outliers increased noticeably. For applications that require 
consistently super low latencies (few milliseconds) for all records, Flink has 
a switch to skip the
 stream alignment during a checkpoint. Checkpoint snapshots are still drawn as 
soon as an operator has seen the checkpoint barrier from each input.
 
 When the alignment is skipped, an operator keeps processing all inputs, even 
after some checkpoint barriers for checkpoint *n* arrived. That way, the 
operator also processes
 elements that belong to checkpoint *n+1* before the state snapshot for 
checkpoint *n* was taken.
 On a restore, these records will occur as duplicates, because they are both 
included in the state snapshot of checkpoint *n*, and will be replayed as part
-of the data after checkoint *n*.
+of the data after checkpoint *n*.
 
-*NOTE*: Alignment happens only for operators wih multiple predecessors (joins) 
as well as operators with multiple senders (after a stream 
repartitionging/shuffle).
-Because of that, dataflows with only embarassingly parallel streaming 
operations (`map()`, `flatMap()`, `filter()`, ...) actually give *exactly once* 
guarantees even
+*NOTE*: Alignment happens only for operators with multiple predecessors 
(joins) as well as operators with multiple senders (after a stream 
repartitioning/shuffle).
+Because of that, dataflows with only embarrassingly parallel streaming 
operations (`map()`, `flatMap()`, `filter()`, ...) actually give *exactly once* 
guarantees even
 in *at least once* mode.
 
 <!--

flink git commit: [doc] fixed typos in "Internals -> Fault Tolerance for Data Streaming"

Reply via email to