[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation

2017-08-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4441


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation

2017-08-03 Thread alpinegizmo
Github user alpinegizmo commented on a diff in the pull request:

https://github.com/apache/flink/pull/4441#discussion_r131087040
  
--- Diff: docs/dev/stream/state/index.md ---
@@ -0,0 +1,56 @@
+---
+title: "State & Fault Tolerance"
+nav-id: streaming_state
+nav-title: "State & Fault Tolerance"
+nav-parent_id: streaming
+nav-pos: 3
+nav-show_overview: true
+---
+
+
+Stateful functions and operators store data across the processing of 
individual elements/events, making state a critical building block for
+any type of more elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state 
will store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the 
pending aggregates.
+  - When training a machine learning model over a stream of data points, 
the state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient 
access to events occured in the past. 
+
+Flink needs to be aware of the state in order to make state fault tolerant 
using [checkpoints](checkpointing.html) and allow [savepoints]({{ site.baseurl 
}}/ops/state/savepoints.html) of streaming applications.
--- End diff --

"and to allow [savepoints]"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation

2017-08-03 Thread alpinegizmo
Github user alpinegizmo commented on a diff in the pull request:

https://github.com/apache/flink/pull/4441#discussion_r131085493
  
--- Diff: docs/dev/stream/state/custom_serialization.md ---
@@ -0,0 +1,188 @@
+---
+title: "Custom Serialization for Managed State"
+nav-title: "Custom Serialization"
+nav-parent_id: streaming_state
+nav-pos: 10
+---
+
+
+If your application uses Flink's managed state, it might be necessary to 
implement a custom serialization logic for special use cases.
--- End diff --

drop the word "a" in "implement a custom serialization logic" so that it 
reads "implement custom serialization logic"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation

2017-08-03 Thread alpinegizmo
Github user alpinegizmo commented on a diff in the pull request:

https://github.com/apache/flink/pull/4441#discussion_r131086039
  
--- Diff: docs/dev/stream/state/custom_serialization.md ---
@@ -0,0 +1,188 @@
+---
+title: "Custom Serialization for Managed State"
+nav-title: "Custom Serialization"
+nav-parent_id: streaming_state
+nav-pos: 10
+---
+
+
+If your application uses Flink's managed state, it might be necessary to 
implement a custom serialization logic for special use cases.
+
+This page is targeted as a guideline for users who require the use of 
custom serialization for their state, covering how
+to provide a custom serializer and how to handle upgrades to the 
serializer for compatibility. If you're simply using
+Flink's own serializers, this page is irrelevant and can be skipped.
+
+### Using custom serializers
+
+As demonstrated in the above examples, when registering a managed operator 
or keyed state, a `StateDescriptor` is required
+to specify the state's name, as well as information about the type of the 
state. The type information is used by Flink's
+[type serialization framework](../../types_serialization.html) to create 
appropriate serializers for the state.
+
+It is also possible to completely bypass this and let Flink use your own 
custom serializer to serialize managed states,
+simply by directly instantiating the `StateDescriptor` with your own 
`TypeSerializer` implementation:
+
+
+
+{% highlight java %}
+public class CustomTypeSerializer extends TypeSerializer> {...};
+
+ListStateDescriptor> descriptor =
+new ListStateDescriptor<>(
+"state-name",
+new CustomTypeSerializer());
+
+checkpointedState = getRuntimeContext().getListState(descriptor);
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+class CustomTypeSerializer extends TypeSerializer[(String, Integer)] {...}
+
+val descriptor = new ListStateDescriptor[(String, Integer)](
+"state-name",
+new CustomTypeSerializer)
+)
+
+checkpointedState = getRuntimeContext.getListState(descriptor);
+{% endhighlight %}
+
+
+
+Note that Flink writes state serializers along with the state as metadata. 
In certain cases on restore (see following
+subsections), the written serializer needs to be deserialized and used. 
Therefore, it is recommended to avoid using
+anonymous classes as your state serializers. Anonymous classes do not have 
a guarantee on the generated classname,
+varying across compilers and depends on the order that they are 
instantiated within the enclosing class, which can 
--- End diff --

"varying across compilers and depends" ==> "which varies across compilers 
and depends"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #4441: [FLINK-7301] [docs] Rework state documentation

2017-07-31 Thread twalthr
GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/4441

[FLINK-7301] [docs] Rework state documentation

## What is the purpose of the change

*This PR restructures state related documentation pages. It introduces some 
state introduction page and moves some files (from `setup/` to `ops/`) 
according to the new documentation structure.*

## Brief change log

*Documentation changes only.*

## Verifying this change

*Built with built script and links checked.*

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): no
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
  - The serializers: no
  - The runtime per-record code paths (performance sensitive): no
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no

## Documentation

  - Does this pull request introduce a new feature? no
  - If yes, how is the feature documented? not applicable



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink FLINK-7301

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4441.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4441


commit b53e758216364903f211277052dbba4ae99da7d3
Author: twalthr 
Date:   2017-07-31T18:14:31Z

[FLINK-7301] [docs] Rework state documentation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---