[jira] [Commented] (FLINK-7301) Rework state documentation

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118254#comment-16118254
 ] 

ASF GitHub Bot commented on FLINK-7301:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4441


> Rework state documentation
> --
>
> Key: FLINK-7301
> URL: https://issues.apache.org/jira/browse/FLINK-7301
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The documentation about state is spread across different pages, but this is 
> not consistent and it is hard to find what you need. I propose:
> "Mention State Backends and link to them in ""Streaming/Working with State"".
> Create category ""State & Fault Tolerance"" under ""Streaming"". Move 
> ""Working with State"", ""Checkpointing"" and ""Queryable State"".
> Move API related parts (90%) of ""Deployment/State & Fault Tolerance/State 
> Backends"" to ""Streaming/State & Fault Tolerance/State Backends"".
> Move all tuning things from ""Debugging/Large State"" to ""Deployment/State & 
> Fault Tolerance/State Backends"".
> Move ""Streaming/Working with State/Custom Serialization for Managed State"" 
> to ""Streaming/State & Fault Tolerance/Custom Serialization"" (Add a link 
> from previous position, also link from ""Data Types & Serialization"")."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7301) Rework state documentation

2017-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118251#comment-16118251
 ] 

ASF GitHub Bot commented on FLINK-7301:
---

Github user twalthr commented on the issue:

https://github.com/apache/flink/pull/4441
  
Thanks @alpinegizmo. I will merge this now.


> Rework state documentation
> --
>
> Key: FLINK-7301
> URL: https://issues.apache.org/jira/browse/FLINK-7301
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The documentation about state is spread across different pages, but this is 
> not consistent and it is hard to find what you need. I propose:
> "Mention State Backends and link to them in ""Streaming/Working with State"".
> Create category ""State & Fault Tolerance"" under ""Streaming"". Move 
> ""Working with State"", ""Checkpointing"" and ""Queryable State"".
> Move API related parts (90%) of ""Deployment/State & Fault Tolerance/State 
> Backends"" to ""Streaming/State & Fault Tolerance/State Backends"".
> Move all tuning things from ""Debugging/Large State"" to ""Deployment/State & 
> Fault Tolerance/State Backends"".
> Move ""Streaming/Working with State/Custom Serialization for Managed State"" 
> to ""Streaming/State & Fault Tolerance/Custom Serialization"" (Add a link 
> from previous position, also link from ""Data Types & Serialization"")."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7301) Rework state documentation

2017-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112535#comment-16112535
 ] 

ASF GitHub Bot commented on FLINK-7301:
---

Github user alpinegizmo commented on the issue:

https://github.com/apache/flink/pull/4441
  
@twalthr Duh, of course, you're right. 

+1


> Rework state documentation
> --
>
> Key: FLINK-7301
> URL: https://issues.apache.org/jira/browse/FLINK-7301
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The documentation about state is spread across different pages, but this is 
> not consistent and it is hard to find what you need. I propose:
> "Mention State Backends and link to them in ""Streaming/Working with State"".
> Create category ""State & Fault Tolerance"" under ""Streaming"". Move 
> ""Working with State"", ""Checkpointing"" and ""Queryable State"".
> Move API related parts (90%) of ""Deployment/State & Fault Tolerance/State 
> Backends"" to ""Streaming/State & Fault Tolerance/State Backends"".
> Move all tuning things from ""Debugging/Large State"" to ""Deployment/State & 
> Fault Tolerance/State Backends"".
> Move ""Streaming/Working with State/Custom Serialization for Managed State"" 
> to ""Streaming/State & Fault Tolerance/Custom Serialization"" (Add a link 
> from previous position, also link from ""Data Types & Serialization"")."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7301) Rework state documentation

2017-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112504#comment-16112504
 ] 

ASF GitHub Bot commented on FLINK-7301:
---

Github user twalthr commented on the issue:

https://github.com/apache/flink/pull/4441
  
@alpinegizmo I thought about adding redirects, but we would be in redirect 
hell if we would add every single page in the future. Actually, only links to 
the master docs change and we should not use links to master docs in 
trainings/stackoverflow anyway. Proper links to released docs remain unchanged.


> Rework state documentation
> --
>
> Key: FLINK-7301
> URL: https://issues.apache.org/jira/browse/FLINK-7301
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The documentation about state is spread across different pages, but this is 
> not consistent and it is hard to find what you need. I propose:
> "Mention State Backends and link to them in ""Streaming/Working with State"".
> Create category ""State & Fault Tolerance"" under ""Streaming"". Move 
> ""Working with State"", ""Checkpointing"" and ""Queryable State"".
> Move API related parts (90%) of ""Deployment/State & Fault Tolerance/State 
> Backends"" to ""Streaming/State & Fault Tolerance/State Backends"".
> Move all tuning things from ""Debugging/Large State"" to ""Deployment/State & 
> Fault Tolerance/State Backends"".
> Move ""Streaming/Working with State/Custom Serialization for Managed State"" 
> to ""Streaming/State & Fault Tolerance/Custom Serialization"" (Add a link 
> from previous position, also link from ""Data Types & Serialization"")."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7301) Rework state documentation

2017-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112488#comment-16112488
 ] 

ASF GitHub Bot commented on FLINK-7301:
---

Github user alpinegizmo commented on a diff in the pull request:

https://github.com/apache/flink/pull/4441#discussion_r131085493
  
--- Diff: docs/dev/stream/state/custom_serialization.md ---
@@ -0,0 +1,188 @@
+---
+title: "Custom Serialization for Managed State"
+nav-title: "Custom Serialization"
+nav-parent_id: streaming_state
+nav-pos: 10
+---
+
+
+If your application uses Flink's managed state, it might be necessary to 
implement a custom serialization logic for special use cases.
--- End diff --

drop the word "a" in "implement a custom serialization logic" so that it 
reads "implement custom serialization logic"


> Rework state documentation
> --
>
> Key: FLINK-7301
> URL: https://issues.apache.org/jira/browse/FLINK-7301
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The documentation about state is spread across different pages, but this is 
> not consistent and it is hard to find what you need. I propose:
> "Mention State Backends and link to them in ""Streaming/Working with State"".
> Create category ""State & Fault Tolerance"" under ""Streaming"". Move 
> ""Working with State"", ""Checkpointing"" and ""Queryable State"".
> Move API related parts (90%) of ""Deployment/State & Fault Tolerance/State 
> Backends"" to ""Streaming/State & Fault Tolerance/State Backends"".
> Move all tuning things from ""Debugging/Large State"" to ""Deployment/State & 
> Fault Tolerance/State Backends"".
> Move ""Streaming/Working with State/Custom Serialization for Managed State"" 
> to ""Streaming/State & Fault Tolerance/Custom Serialization"" (Add a link 
> from previous position, also link from ""Data Types & Serialization"")."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7301) Rework state documentation

2017-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112486#comment-16112486
 ] 

ASF GitHub Bot commented on FLINK-7301:
---

Github user alpinegizmo commented on a diff in the pull request:

https://github.com/apache/flink/pull/4441#discussion_r131086039
  
--- Diff: docs/dev/stream/state/custom_serialization.md ---
@@ -0,0 +1,188 @@
+---
+title: "Custom Serialization for Managed State"
+nav-title: "Custom Serialization"
+nav-parent_id: streaming_state
+nav-pos: 10
+---
+
+
+If your application uses Flink's managed state, it might be necessary to 
implement a custom serialization logic for special use cases.
+
+This page is targeted as a guideline for users who require the use of 
custom serialization for their state, covering how
+to provide a custom serializer and how to handle upgrades to the 
serializer for compatibility. If you're simply using
+Flink's own serializers, this page is irrelevant and can be skipped.
+
+### Using custom serializers
+
+As demonstrated in the above examples, when registering a managed operator 
or keyed state, a `StateDescriptor` is required
+to specify the state's name, as well as information about the type of the 
state. The type information is used by Flink's
+[type serialization framework](../../types_serialization.html) to create 
appropriate serializers for the state.
+
+It is also possible to completely bypass this and let Flink use your own 
custom serializer to serialize managed states,
+simply by directly instantiating the `StateDescriptor` with your own 
`TypeSerializer` implementation:
+
+
+
+{% highlight java %}
+public class CustomTypeSerializer extends TypeSerializer> {...};
+
+ListStateDescriptor> descriptor =
+new ListStateDescriptor<>(
+"state-name",
+new CustomTypeSerializer());
+
+checkpointedState = getRuntimeContext().getListState(descriptor);
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+class CustomTypeSerializer extends TypeSerializer[(String, Integer)] {...}
+
+val descriptor = new ListStateDescriptor[(String, Integer)](
+"state-name",
+new CustomTypeSerializer)
+)
+
+checkpointedState = getRuntimeContext.getListState(descriptor);
+{% endhighlight %}
+
+
+
+Note that Flink writes state serializers along with the state as metadata. 
In certain cases on restore (see following
+subsections), the written serializer needs to be deserialized and used. 
Therefore, it is recommended to avoid using
+anonymous classes as your state serializers. Anonymous classes do not have 
a guarantee on the generated classname,
+varying across compilers and depends on the order that they are 
instantiated within the enclosing class, which can 
--- End diff --

"varying across compilers and depends" ==> "which varies across compilers 
and depends"


> Rework state documentation
> --
>
> Key: FLINK-7301
> URL: https://issues.apache.org/jira/browse/FLINK-7301
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The documentation about state is spread across different pages, but this is 
> not consistent and it is hard to find what you need. I propose:
> "Mention State Backends and link to them in ""Streaming/Working with State"".
> Create category ""State & Fault Tolerance"" under ""Streaming"". Move 
> ""Working with State"", ""Checkpointing"" and ""Queryable State"".
> Move API related parts (90%) of ""Deployment/State & Fault Tolerance/State 
> Backends"" to ""Streaming/State & Fault Tolerance/State Backends"".
> Move all tuning things from ""Debugging/Large State"" to ""Deployment/State & 
> Fault Tolerance/State Backends"".
> Move ""Streaming/Working with State/Custom Serialization for Managed State"" 
> to ""Streaming/State & Fault Tolerance/Custom Serialization"" (Add a link 
> from previous position, also link from ""Data Types & Serialization"")."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7301) Rework state documentation

2017-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112487#comment-16112487
 ] 

ASF GitHub Bot commented on FLINK-7301:
---

Github user alpinegizmo commented on a diff in the pull request:

https://github.com/apache/flink/pull/4441#discussion_r131087040
  
--- Diff: docs/dev/stream/state/index.md ---
@@ -0,0 +1,56 @@
+---
+title: "State & Fault Tolerance"
+nav-id: streaming_state
+nav-title: "State & Fault Tolerance"
+nav-parent_id: streaming
+nav-pos: 3
+nav-show_overview: true
+---
+
+
+Stateful functions and operators store data across the processing of 
individual elements/events, making state a critical building block for
+any type of more elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state 
will store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the 
pending aggregates.
+  - When training a machine learning model over a stream of data points, 
the state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient 
access to events occured in the past. 
+
+Flink needs to be aware of the state in order to make state fault tolerant 
using [checkpoints](checkpointing.html) and allow [savepoints]({{ site.baseurl 
}}/ops/state/savepoints.html) of streaming applications.
--- End diff --

"and to allow [savepoints]"


> Rework state documentation
> --
>
> Key: FLINK-7301
> URL: https://issues.apache.org/jira/browse/FLINK-7301
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The documentation about state is spread across different pages, but this is 
> not consistent and it is hard to find what you need. I propose:
> "Mention State Backends and link to them in ""Streaming/Working with State"".
> Create category ""State & Fault Tolerance"" under ""Streaming"". Move 
> ""Working with State"", ""Checkpointing"" and ""Queryable State"".
> Move API related parts (90%) of ""Deployment/State & Fault Tolerance/State 
> Backends"" to ""Streaming/State & Fault Tolerance/State Backends"".
> Move all tuning things from ""Debugging/Large State"" to ""Deployment/State & 
> Fault Tolerance/State Backends"".
> Move ""Streaming/Working with State/Custom Serialization for Managed State"" 
> to ""Streaming/State & Fault Tolerance/Custom Serialization"" (Add a link 
> from previous position, also link from ""Data Types & Serialization"")."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7301) Rework state documentation

2017-07-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107732#comment-16107732
 ] 

ASF GitHub Bot commented on FLINK-7301:
---

Github user twalthr commented on the issue:

https://github.com/apache/flink/pull/4441
  
CC @alpinegizmo 


> Rework state documentation
> --
>
> Key: FLINK-7301
> URL: https://issues.apache.org/jira/browse/FLINK-7301
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The documentation about state is spread across different pages, but this is 
> not consistent and it is hard to find what you need. I propose:
> "Mention State Backends and link to them in ""Streaming/Working with State"".
> Create category ""State & Fault Tolerance"" under ""Streaming"". Move 
> ""Working with State"", ""Checkpointing"" and ""Queryable State"".
> Move API related parts (90%) of ""Deployment/State & Fault Tolerance/State 
> Backends"" to ""Streaming/State & Fault Tolerance/State Backends"".
> Move all tuning things from ""Debugging/Large State"" to ""Deployment/State & 
> Fault Tolerance/State Backends"".
> Move ""Streaming/Working with State/Custom Serialization for Managed State"" 
> to ""Streaming/State & Fault Tolerance/Custom Serialization"" (Add a link 
> from previous position, also link from ""Data Types & Serialization"")."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7301) Rework state documentation

2017-07-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107731#comment-16107731
 ] 

ASF GitHub Bot commented on FLINK-7301:
---

GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/4441

[FLINK-7301] [docs] Rework state documentation

## What is the purpose of the change

*This PR restructures state related documentation pages. It introduces some 
state introduction page and moves some files (from `setup/` to `ops/`) 
according to the new documentation structure.*

## Brief change log

*Documentation changes only.*

## Verifying this change

*Built with built script and links checked.*

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): no
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
  - The serializers: no
  - The runtime per-record code paths (performance sensitive): no
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no

## Documentation

  - Does this pull request introduce a new feature? no
  - If yes, how is the feature documented? not applicable



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink FLINK-7301

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4441.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4441


commit b53e758216364903f211277052dbba4ae99da7d3
Author: twalthr 
Date:   2017-07-31T18:14:31Z

[FLINK-7301] [docs] Rework state documentation




> Rework state documentation
> --
>
> Key: FLINK-7301
> URL: https://issues.apache.org/jira/browse/FLINK-7301
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The documentation about state is spread across different pages, but this is 
> not consistent and it is hard to find what you need. I propose:
> "Mention State Backends and link to them in ""Streaming/Working with State"".
> Create category ""State & Fault Tolerance"" under ""Streaming"". Move 
> ""Working with State"", ""Checkpointing"" and ""Queryable State"".
> Move API related parts (90%) of ""Deployment/State & Fault Tolerance/State 
> Backends"" to ""Streaming/State & Fault Tolerance/State Backends"".
> Move all tuning things from ""Debugging/Large State"" to ""Deployment/State & 
> Fault Tolerance/State Backends"".
> Move ""Streaming/Working with State/Custom Serialization for Managed State"" 
> to ""Streaming/State & Fault Tolerance/Custom Serialization"" (Add a link 
> from previous position, also link from ""Data Types & Serialization"")."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)