[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/4441 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation
Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/4441#discussion_r131087040 --- Diff: docs/dev/stream/state/index.md --- @@ -0,0 +1,56 @@ +--- +title: "State & Fault Tolerance" +nav-id: streaming_state +nav-title: "State & Fault Tolerance" +nav-parent_id: streaming +nav-pos: 3 +nav-show_overview: true +--- + + +Stateful functions and operators store data across the processing of individual elements/events, making state a critical building block for +any type of more elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending aggregates. + - When training a machine learning model over a stream of data points, the state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access to events occured in the past. + +Flink needs to be aware of the state in order to make state fault tolerant using [checkpoints](checkpointing.html) and allow [savepoints]({{ site.baseurl }}/ops/state/savepoints.html) of streaming applications. --- End diff -- "and to allow [savepoints]" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation
Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/4441#discussion_r131085493 --- Diff: docs/dev/stream/state/custom_serialization.md --- @@ -0,0 +1,188 @@ +--- +title: "Custom Serialization for Managed State" +nav-title: "Custom Serialization" +nav-parent_id: streaming_state +nav-pos: 10 +--- + + +If your application uses Flink's managed state, it might be necessary to implement a custom serialization logic for special use cases. --- End diff -- drop the word "a" in "implement a custom serialization logic" so that it reads "implement custom serialization logic" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation
Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/4441#discussion_r131086039 --- Diff: docs/dev/stream/state/custom_serialization.md --- @@ -0,0 +1,188 @@ +--- +title: "Custom Serialization for Managed State" +nav-title: "Custom Serialization" +nav-parent_id: streaming_state +nav-pos: 10 +--- + + +If your application uses Flink's managed state, it might be necessary to implement a custom serialization logic for special use cases. + +This page is targeted as a guideline for users who require the use of custom serialization for their state, covering how +to provide a custom serializer and how to handle upgrades to the serializer for compatibility. If you're simply using +Flink's own serializers, this page is irrelevant and can be skipped. + +### Using custom serializers + +As demonstrated in the above examples, when registering a managed operator or keyed state, a `StateDescriptor` is required +to specify the state's name, as well as information about the type of the state. The type information is used by Flink's +[type serialization framework](../../types_serialization.html) to create appropriate serializers for the state. + +It is also possible to completely bypass this and let Flink use your own custom serializer to serialize managed states, +simply by directly instantiating the `StateDescriptor` with your own `TypeSerializer` implementation: + + + +{% highlight java %} +public class CustomTypeSerializer extends TypeSerializer> {...}; + +ListStateDescriptor> descriptor = +new ListStateDescriptor<>( +"state-name", +new CustomTypeSerializer()); + +checkpointedState = getRuntimeContext().getListState(descriptor); +{% endhighlight %} + + + +{% highlight scala %} +class CustomTypeSerializer extends TypeSerializer[(String, Integer)] {...} + +val descriptor = new ListStateDescriptor[(String, Integer)]( +"state-name", +new CustomTypeSerializer) +) + +checkpointedState = getRuntimeContext.getListState(descriptor); +{% endhighlight %} + + + +Note that Flink writes state serializers along with the state as metadata. In certain cases on restore (see following +subsections), the written serializer needs to be deserialized and used. Therefore, it is recommended to avoid using +anonymous classes as your state serializers. Anonymous classes do not have a guarantee on the generated classname, +varying across compilers and depends on the order that they are instantiated within the enclosing class, which can --- End diff -- "varying across compilers and depends" ==> "which varies across compilers and depends" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation
GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/4441 [FLINK-7301] [docs] Rework state documentation ## What is the purpose of the change *This PR restructures state related documentation pages. It introduces some state introduction page and moves some files (from `setup/` to `ops/`) according to the new documentation structure.* ## Brief change log *Documentation changes only.* ## Verifying this change *Built with built script and links checked.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink FLINK-7301 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4441.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4441 commit b53e758216364903f211277052dbba4ae99da7d3 Author: twalthr Date: 2017-07-31T18:14:31Z [FLINK-7301] [docs] Rework state documentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---