date:20200214

[GitHub] [flink] flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement working StreamMultipleInputProcessor

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement 
working StreamMultipleInputProcessor
URL: https://github.com/apache/flink/pull/11098#issuecomment-586328210
 
 
   
   ## CI report:
   
   * df7f0151d94bc7705c87baf855ae3d8d57f7e463 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148994879) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5187)
 
   * 311d3e9843bd601a8de8bee78c2ecd34222d19d6 UNKNOWN
   * e72b2a38b7ec718b44a14f98897dbaf22f9d0d0d Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/149078552) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5196)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on issue #11095: [FLINK-10918][rocks-db-backend] Fix incremental checkpoints on Windows.

2020-02-14 Thread GitBox

carp84 commented on issue #11095: [FLINK-10918][rocks-db-backend] Fix 
incremental checkpoints on Windows.
URL: https://github.com/apache/flink/pull/11095#issuecomment-586565223
 
 
   Thanks for the note @StephanEwen, checking now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (FLINK-16068) table with keyword-escaped columns and computed_column_expression columns

2020-02-14 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-16068:

Fix Version/s: 1.10.1

> table with keyword-escaped columns and computed_column_expression columns
> -
>
> Key: FLINK-16068
> URL: https://issues.apache.org/jira/browse/FLINK-16068
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.10.0
>Reporter: pangliang
>Priority: Major
> Fix For: 1.10.1
>
>
> I use sql-client to create a table with keyword-escaped column and 
> computed_column_expression column, like this:
> {code:java}
> CREATE TABLE source_kafka (
> log STRING,
> `time` BIGINT,
> pt as proctime()
> ) WITH (
>   'connector.type' = 'kafka',   
>   'connector.version' = 'universal',
>   'connector.topic' = 'k8s-logs',
>   'connector.startup-mode' = 'latest-offset',
>   'connector.properties.zookeeper.connect' = 
> 'zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181/kafka',
>   'connector.properties.bootstrap.servers' = 'kafka.default:9092',
>   'connector.properties.group.id' = 'testGroup',
>   'format.type'='json',
>   'format.fail-on-missing-field' = 'true',
>   'update-mode' = 'append'
> );
> {code}
> Then I simply used it :
> {code:java}
> SELECT * from source_kafka limit 10;{code}
> got an exception:
> {code:java}
> java.io.IOException: Fail to run stream sql job
>   at 
> org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:164)
>   at 
> org.apache.zeppelin.flink.FlinkStreamSqlInterpreter.callSelect(FlinkStreamSqlInterpreter.java:108)
>   at 
> org.apache.zeppelin.flink.FlinkSqlInterrpeter.callCommand(FlinkSqlInterrpeter.java:203)
>   at 
> org.apache.zeppelin.flink.FlinkSqlInterrpeter.runSqlList(FlinkSqlInterrpeter.java:151)
>   at 
> org.apache.zeppelin.flink.FlinkSqlInterrpeter.interpret(FlinkSqlInterrpeter.java:104)
>   at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
>   at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:676)
>   at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:569)
>   at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
>   at 
> org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121)
>   at 
> org.apache.zeppelin.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:39)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.table.api.SqlParserException: SQL parse failed. 
> Encountered "time" at line 1, column 12.
> Was expecting one of:
> "ABS" ...
> "ARRAY" ...
> "AVG" ...
> "CARDINALITY" ...
> "CASE" ...
> "CAST" ...
> "CEIL" ...
> "CEILING" ...
> ..
> 
>   at 
> org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:50)
>   at 
> org.apache.flink.table.planner.calcite.SqlExprToRexConverterImpl.convertToRexNodes(SqlExprToRexConverterImpl.java:79)
>   at 
> org.apache.flink.table.planner.plan.schema.CatalogSourceTable.toRel(CatalogSourceTable.scala:111)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.toRel(SqlToRelConverter.java:3328)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2357)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2051)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2005)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:646)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563)
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:148)
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:135)
>   at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.toQueryOperation(SqlToOperationConverter.java:522)
>   at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlQuery(SqlToOperationConverter.java:436)
>   at 
>

[jira] [Commented] (FLINK-16068) table with keyword-escaped columns and computed_column_expression columns

2020-02-14 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037445#comment-17037445
 ] 

Jark Wu commented on FLINK-16068:
-

cc [~danny0405], could you help to take a look at this? It seems there is a bug 
in the SQL parser when the column name is a keyword and a computed column in 
the definition.  

> table with keyword-escaped columns and computed_column_expression columns
> -
>
> Key: FLINK-16068
> URL: https://issues.apache.org/jira/browse/FLINK-16068
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.10.0
>Reporter: pangliang
>Priority: Major
>
> I use sql-client to create a table with keyword-escaped column and 
> computed_column_expression column, like this:
> {code:java}
> CREATE TABLE source_kafka (
> log STRING,
> `time` BIGINT,
> pt as proctime()
> ) WITH (
>   'connector.type' = 'kafka',   
>   'connector.version' = 'universal',
>   'connector.topic' = 'k8s-logs',
>   'connector.startup-mode' = 'latest-offset',
>   'connector.properties.zookeeper.connect' = 
> 'zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181/kafka',
>   'connector.properties.bootstrap.servers' = 'kafka.default:9092',
>   'connector.properties.group.id' = 'testGroup',
>   'format.type'='json',
>   'format.fail-on-missing-field' = 'true',
>   'update-mode' = 'append'
> );
> {code}
> Then I simply used it :
> {code:java}
> SELECT * from source_kafka limit 10;{code}
> got an exception:
> {code:java}
> java.io.IOException: Fail to run stream sql job
>   at 
> org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:164)
>   at 
> org.apache.zeppelin.flink.FlinkStreamSqlInterpreter.callSelect(FlinkStreamSqlInterpreter.java:108)
>   at 
> org.apache.zeppelin.flink.FlinkSqlInterrpeter.callCommand(FlinkSqlInterrpeter.java:203)
>   at 
> org.apache.zeppelin.flink.FlinkSqlInterrpeter.runSqlList(FlinkSqlInterrpeter.java:151)
>   at 
> org.apache.zeppelin.flink.FlinkSqlInterrpeter.interpret(FlinkSqlInterrpeter.java:104)
>   at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
>   at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:676)
>   at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:569)
>   at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
>   at 
> org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121)
>   at 
> org.apache.zeppelin.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:39)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.table.api.SqlParserException: SQL parse failed. 
> Encountered "time" at line 1, column 12.
> Was expecting one of:
> "ABS" ...
> "ARRAY" ...
> "AVG" ...
> "CARDINALITY" ...
> "CASE" ...
> "CAST" ...
> "CEIL" ...
> "CEILING" ...
> ..
> 
>   at 
> org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:50)
>   at 
> org.apache.flink.table.planner.calcite.SqlExprToRexConverterImpl.convertToRexNodes(SqlExprToRexConverterImpl.java:79)
>   at 
> org.apache.flink.table.planner.plan.schema.CatalogSourceTable.toRel(CatalogSourceTable.scala:111)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.toRel(SqlToRelConverter.java:3328)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2357)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2051)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2005)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:646)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563)
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:148)
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:135)
>   at 
>

[jira] [Closed] (FLINK-16052) Homebrew test failed with 1.10.0 dist package

2020-02-14 Thread Yu Li (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li closed FLINK-16052.
-
Fix Version/s: (was: 1.10.1)
   (was: 1.11.0)
 Assignee: Yu Li
   Resolution: Not A Bug

Homebrew PR merged thus closing the JIRA, remove fix version since this is not 
a Flink bug. Thanks all for the efforts!

> Homebrew test failed with 1.10.0 dist package
> -
>
> Key: FLINK-16052
> URL: https://issues.apache.org/jira/browse/FLINK-16052
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Scripts
>Affects Versions: 1.10.0
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
>
> After updating the Homebrew formula to 1.10.0 (in {{$(brew --repository 
> homebrew/core)}} directory) with patch of this 
> [PR|https://github.com/Homebrew/homebrew-core/pull/50110], executing `brew 
> install --build-from-source Formula/apache-flink.rb` and then `brew test 
> Formula/apache-flink.rb`, we could see below error:
> {code:java}
> [ERROR] Unexpected result: Error: Could not find or load main class 
> org.apache.flink.runtime.util.BashJavaUtils
> [ERROR] The last line of the BashJavaUtils outputs is expected to be the 
> execution result, following the prefix 'BASH_JAVA_UTILS_EXEC_RESULT:'
> Picked up _JAVA_OPTIONS: 
> -Djava.io.tmpdir=/private/tmp/apache-flink-test-20200214-33361-1jotper 
> -Duser.home=/Users/jueding/Library/Caches/Homebrew/java_cache
> Error: Could not find or load main class 
> org.apache.flink.runtime.util.BashJavaUtils
> [ERROR] Could not get JVM parameters properly.
> Error: apache-flink: failed
> Failed executing:
> {code}
> After a bisect checking on {{flink-dist/src/main/flink-bin/bin}} changes, 
> confirmed the above issue is related to FLINK-15488, but we will see new 
> errors like below after reverting FLINK-15488 (and FLINK-15519):
> {code:java}
> ==> /usr/local/Cellar/apache-flink/1.10.0/libexec/bin/start-cluster.sh
> ==> /usr/local/Cellar/apache-flink/1.10.0/bin/flink run -p 1 
> /usr/local/Cellar/apache-flink/1.10.0/libexec/examples/streaming/WordCount.jar
>  --input input --output result
> Last 15 lines from 
> /Users/jueding/Library/Logs/Homebrew/apache-flink/test.02.flink:
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>   ... 4 more
> Caused by: 
> org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException:
>  Could not fulfill slot request b7f17c0928112209ae873d089123b1c6. Requested 
> resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.lambda$fulfillPendingSlotRequestWithPendingTaskManagerSlot$9(SlotManagerImpl.java:772)
>   at 
> org.apache.flink.util.OptionalConsumer.ifNotPresent(OptionalConsumer.java:52)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.fulfillPendingSlotRequestWithPendingTaskManagerSlot(SlotManagerImpl.java:768)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.lambda$internalRequestSlot$7(SlotManagerImpl.java:755)
>   at 
> org.apache.flink.util.OptionalConsumer.ifNotPresent(OptionalConsumer.java:52)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.internalRequestSlot(SlotManagerImpl.java:755)
>   at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.registerSlotRequest(SlotManagerImpl.java:314)
>   ... 27 more
> Error: apache-flink: failed
> Failed executing:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement working StreamMultipleInputProcessor

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement 
working StreamMultipleInputProcessor
URL: https://github.com/apache/flink/pull/11098#issuecomment-586328210
 
 
   
   ## CI report:
   
   * df7f0151d94bc7705c87baf855ae3d8d57f7e463 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148994879) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5187)
 
   * 311d3e9843bd601a8de8bee78c2ecd34222d19d6 UNKNOWN
   * e72b2a38b7ec718b44a14f98897dbaf22f9d0d0d UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rkhachatryan commented on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests

2020-02-14 Thread GitBox

rkhachatryan commented on issue #11096: [FLINK-16056][runtime][tests] fix CFRO 
creation in tests
URL: https://github.com/apache/flink/pull/11096#issuecomment-586562655
 
 
   Thanks for feedback @AHeise, 
   I've addressed the issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement working StreamMultipleInputProcessor

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement 
working StreamMultipleInputProcessor
URL: https://github.com/apache/flink/pull/11098#issuecomment-586328210
 
 
   
   ## CI report:
   
   * df7f0151d94bc7705c87baf855ae3d8d57f7e463 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148994879) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5187)
 
   * 311d3e9843bd601a8de8bee78c2ecd34222d19d6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include 
flink-ml-api and flink-ml-lib in opt
URL: https://github.com/apache/flink/pull/10995#issuecomment-581294934
 
 
   
   ## CI report:
   
   * 65debfebc1bad89160a6375b9c78ea8eb063 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/147155607) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4764)
 
   * 1385f42a76ba0060d37022873064e736d52ea0af Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/147163976) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4766)
 
   * 9c1edd078c466ee56f2aed16ce9098e35bfae69e Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149075460) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5194)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Closed] (FLINK-14802) Multi vectorized read version support for hive orc read

2020-02-14 Thread Bowen Li (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li closed FLINK-14802.

Release Note: Support vectorized read for Hive 1.x versions.
  Resolution: Fixed

master: 63e9e9d200ef4f17097ae236be09d308efe8b72a

> Multi vectorized read version support for hive orc read
> ---
>
> Key: FLINK-14802
> URL: https://issues.apache.org/jira/browse/FLINK-14802
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive, Connectors / ORC
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The vector api of Hive 1.x version are totally different from hive 2+.
> So we need introduce more efforts to support vectorized read for hive 1.x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-14802) Multi vectorized read version support for hive orc read

2020-02-14 Thread Bowen Li (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li updated FLINK-14802:
-
Release Note: Support vectorized read of ORC in Hive 1.x versions.  (was: 
Support vectorized read for Hive 1.x versions.)

> Multi vectorized read version support for hive orc read
> ---
>
> Key: FLINK-14802
> URL: https://issues.apache.org/jira/browse/FLINK-14802
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive, Connectors / ORC
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The vector api of Hive 1.x version are totally different from hive 2+.
> So we need introduce more efforts to support vectorized read for hive 1.x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] bowenli86 commented on issue #10730: [FLINK-14802][orc][hive] Multi vectorized read version support for hive orc read

2020-02-14 Thread GitBox

bowenli86 commented on issue #10730: [FLINK-14802][orc][hive] Multi vectorized 
read version support for hive orc read
URL: https://github.com/apache/flink/pull/10730#issuecomment-586558519
 
 
   @JingsongLi thanks for the great effort!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 closed pull request #10730: [FLINK-14802][orc][hive] Multi vectorized read version support for hive orc read

2020-02-14 Thread GitBox

bowenli86 closed pull request #10730: [FLINK-14802][orc][hive] Multi vectorized 
read version support for hive orc read
URL: https://github.com/apache/flink/pull/10730
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt

2020-02-14 Thread GitBox

hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] 
Include flink-ml-api and flink-ml-lib in opt
URL: https://github.com/apache/flink/pull/10995#discussion_r379713878
 
 

 ##
 File path: flink-ml-parent/flink-ml-lib/pom.xml
 ##
 @@ -57,4 +57,30 @@ under the License.
1.1.2


+
+   
+   
+   
+   
+   org.apache.maven.plugins
+   maven-shade-plugin
+   
+   
+   shade-flink
+   package
+   
+   shade
+   
+   
+   
+   
+   
com.github.fommil.netlib:core
 
 Review comment:
   @zentol The license has already been added by this commit 
https://github.com/apache/flink/pull/8631/files
   
   Or do you mean we should remove the licensing for release-1.10? I think we 
can create a fix in another PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending 
processing-time timers to make endInput semantics on the operator chain strict
URL: https://github.com/apache/flink/pull/10151#issuecomment-552442468
 
 
   
   ## CI report:
   
   * c6d7f5e864076448dca590035a6a590dc5e25c44 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/135928709) 
   * 682da0aec5dee14c09583468d15115e2a512c827 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/140139283) 
   * 9c66fbe1e4d81c3656eba38d56d39dfe0c065f4f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142108012) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3864)
 
   * 1b23c2232e9717218a7c61c930c481cbcf2e6f2e Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/142214794) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3888)
 
   * 063b5e87dcef4a363f01a48e4af4fb9d3670429f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142218414) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3890)
 
   * d5dfd65a163634584e8eaeee452d5454b2d4fe45 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/143576795) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4193)
 
   * d1f5b89012dc266fc0c664085c1a2aef0c8b95ec Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/143669093) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4204)
 
   * f61c52bcb1420572c2ad94e3d2f1caafbf7f6081 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/143708993) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4215)
 
   * 9ea2c427c8f7046d78c122eea8f2d0c10c200224 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144479909) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4362)
 
   * fd4f362ceed0a4785c449c807cb46dac15f712ea Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148094252) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4978)
 
   * 8e6f927d76fed13e4c095619dee0d72e6a27359f Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148125056) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4983)
 
   * 12a890c54ecc602a3d46e07e8ded4263469c Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148185789) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5005)
 
   * 1a8f90917d5fe5b928f4f70bad62a06f1ef9e3fa Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148207317) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5015)
 
   * d0f2df460885f14ad4d4f0331bde5ff2de9664ec Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148235635) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5018)
 
   * 97bdd27325be54d1ebfcdc2a4bb21b31d910d039 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148335745) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5040)
 
   * 4a56e798775f7f320a3e7b1436d28b00392caa09 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148395560) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5068)
 
   * 564913126f4ac833b52f587e2ccd9de2183dfd5f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148509000) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5081)
 
   * b2951a6743c376d92371d6eaf3f9be7048a3c29c Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149072760) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5193)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include 
flink-ml-api and flink-ml-lib in opt
URL: https://github.com/apache/flink/pull/10995#issuecomment-581294934
 
 
   
   ## CI report:
   
   * 65debfebc1bad89160a6375b9c78ea8eb063 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/147155607) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4764)
 
   * 1385f42a76ba0060d37022873064e736d52ea0af Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/147163976) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4766)
 
   * 9c1edd078c466ee56f2aed16ce9098e35bfae69e Travis: 
[PENDING](https://travis-ci.com/flink-ci/flink/builds/149075460) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5194)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] buptljy commented on a change in pull request #11048: [FLINK-15966][runtime] Capture callstacks for RPC ask() calls to improve exceptions.

2020-02-14 Thread GitBox

buptljy commented on a change in pull request #11048: [FLINK-15966][runtime] 
Capture callstacks for RPC ask() calls to improve exceptions.
URL: https://github.com/apache/flink/pull/11048#discussion_r379743783
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaInvocationHandler.java
 ##
 @@ -370,4 +375,33 @@ public String getHostname() {
public CompletableFuture getTerminationFuture() {
return terminationFuture;
}
+
+   static Object deserializeValueIfNeeded(Object o, Method method) {
+   if (o instanceof SerializedValue) {
+   try {
+   return  ((SerializedValue) 
o).deserializeValue(AkkaInvocationHandler.class.getClassLoader());
+   } catch (IOException | ClassNotFoundException e) {
+   throw new CompletionException(
+   new RpcException(
+   "Could not deserialize the 
serialized payload of RPC method : " + method.getName(), e));
+   }
+   } else {
+   return o;
+   }
+   }
+
+   static Throwable resolveTimeoutException(Throwable exception, @Nullable 
Throwable callStackCapture, Method method) {
+   if (callStackCapture == null || (!(exception instanceof 
akka.pattern.AskTimeoutException))) {
+   return exception;
+   }
+
+   final TimeoutException newException = new 
TimeoutException("Invocation of " + method + " timed out.");
+   newException.initCause(exception);
+
+   // remove the stack frames coming from the proxy interface 
invocation
+   final StackTraceElement[] stackTrace = 
callStackCapture.getStackTrace();
+   newException.setStackTrace(Arrays.copyOfRange(stackTrace, 3, 
stackTrace.length));
 
 Review comment:
   Can we filter the stack trace containing o.a.f.runtime.rpc.akka instead of 
using 3? Because others may change call stack of this function unintentionally 
in the future.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] buptljy commented on a change in pull request #11048: [FLINK-15966][runtime] Capture callstacks for RPC ask() calls to improve exceptions.

2020-02-14 Thread GitBox

buptljy commented on a change in pull request #11048: [FLINK-15966][runtime] 
Capture callstacks for RPC ask() calls to improve exceptions.
URL: https://github.com/apache/flink/pull/11048#discussion_r379740014
 
 

 ##
 File path: 
flink-core/src/main/java/org/apache/flink/configuration/AkkaOptions.java
 ##
 @@ -29,6 +29,18 @@
 @PublicEvolving
 public class AkkaOptions {
 
+   /**
+* Timeout for akka ask calls.
 
 Review comment:
   forget to modify this :) ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] dianfu edited a comment on issue #11009: [Flink-15873] [cep] fix matches result not returned when existing earlier partial matches

2020-02-14 Thread GitBox

dianfu edited a comment on issue #11009: [Flink-15873] [cep] fix matches result 
not returned when existing earlier partial matches
URL: https://github.com/apache/flink/pull/11009#issuecomment-586555110
 
 
   @shuai-xu Thanks for the PR. Actually I don't think this is a bug. It's by 
design that the matched results should only be emit when there are no partial 
matches which are early than the matched result. Otherwise, the after match 
skip strategy will be broken. Take the changed test case 
MatchRecognizedITCase#testAggregates as an example, actually I think the 
original results are correct. What's your thought?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending 
processing-time timers to make endInput semantics on the operator chain strict
URL: https://github.com/apache/flink/pull/10151#issuecomment-552442468
 
 
   
   ## CI report:
   
   * c6d7f5e864076448dca590035a6a590dc5e25c44 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/135928709) 
   * 682da0aec5dee14c09583468d15115e2a512c827 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/140139283) 
   * 9c66fbe1e4d81c3656eba38d56d39dfe0c065f4f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142108012) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3864)
 
   * 1b23c2232e9717218a7c61c930c481cbcf2e6f2e Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/142214794) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3888)
 
   * 063b5e87dcef4a363f01a48e4af4fb9d3670429f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142218414) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3890)
 
   * d5dfd65a163634584e8eaeee452d5454b2d4fe45 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/143576795) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4193)
 
   * d1f5b89012dc266fc0c664085c1a2aef0c8b95ec Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/143669093) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4204)
 
   * f61c52bcb1420572c2ad94e3d2f1caafbf7f6081 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/143708993) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4215)
 
   * 9ea2c427c8f7046d78c122eea8f2d0c10c200224 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144479909) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4362)
 
   * fd4f362ceed0a4785c449c807cb46dac15f712ea Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148094252) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4978)
 
   * 8e6f927d76fed13e4c095619dee0d72e6a27359f Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148125056) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4983)
 
   * 12a890c54ecc602a3d46e07e8ded4263469c Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148185789) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5005)
 
   * 1a8f90917d5fe5b928f4f70bad62a06f1ef9e3fa Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148207317) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5015)
 
   * d0f2df460885f14ad4d4f0331bde5ff2de9664ec Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148235635) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5018)
 
   * 97bdd27325be54d1ebfcdc2a4bb21b31d910d039 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148335745) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5040)
 
   * 4a56e798775f7f320a3e7b1436d28b00392caa09 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148395560) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5068)
 
   * 564913126f4ac833b52f587e2ccd9de2183dfd5f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148509000) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5081)
 
   * b2951a6743c376d92371d6eaf3f9be7048a3c29c Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149072760) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5193)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] dianfu edited a comment on issue #11009: [Flink-15873] [cep] fix matches result not returned when existing earlier partial matches

2020-02-14 Thread GitBox

dianfu edited a comment on issue #11009: [Flink-15873] [cep] fix matches result 
not returned when existing earlier partial matches
URL: https://github.com/apache/flink/pull/11009#issuecomment-586555110
 
 
   @shuai-xu Thanks for the PR. Actually I don't think this is a bug. It's by 
design that the matched results should only be emit when there are no partial 
matches which are early than the matched result. Otherwise, the after match 
skip strategy will be broken. What's your thought?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] dianfu commented on issue #11009: [Flink-15873] [cep] fix matches result not returned when existing earlier partial matches

2020-02-14 Thread GitBox

dianfu commented on issue #11009: [Flink-15873] [cep] fix matches result not 
returned when existing earlier partial matches
URL: https://github.com/apache/flink/pull/11009#issuecomment-586555110
 
 
   @shuai-xu Thanks for the PR. Actually I don't think this is a bug. It's by 
design that the matched results should only be emit when there are no partial 
matches which are early than the matched result. Otherwise, the after match 
skip strategy will be broken.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include 
flink-ml-api and flink-ml-lib in opt
URL: https://github.com/apache/flink/pull/10995#issuecomment-581294934
 
 
   
   ## CI report:
   
   * 65debfebc1bad89160a6375b9c78ea8eb063 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/147155607) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4764)
 
   * 1385f42a76ba0060d37022873064e736d52ea0af Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/147163976) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4766)
 
   * 9c1edd078c466ee56f2aed16ce9098e35bfae69e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (FLINK-16069) Create TaskDeploymentDescriptor in future.

2020-02-14 Thread huweihua (Jira)

huweihua created FLINK-16069:


 Summary: Create TaskDeploymentDescriptor in future.
 Key: FLINK-16069
 URL: https://issues.apache.org/jira/browse/FLINK-16069
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Task
Reporter: huweihua


The deploy of tasks will took long time when we submit a high parallelism job. 
And Execution#deploy run in mainThread, so it will block JobMaster process 
other akka messages, such as Heartbeat. The creation of 
TaskDeploymentDescriptor take most of time. We can put the creation in future.

For example, A job [source(8000)->sink(8000)], the total 16000 tasks from 
SCHEDULED to DEPLOYING took more than 1mins. This caused the heartbeat of 
TaskManager timeout and job never success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending 
processing-time timers to make endInput semantics on the operator chain strict
URL: https://github.com/apache/flink/pull/10151#issuecomment-552442468
 
 
   
   ## CI report:
   
   * c6d7f5e864076448dca590035a6a590dc5e25c44 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/135928709) 
   * 682da0aec5dee14c09583468d15115e2a512c827 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/140139283) 
   * 9c66fbe1e4d81c3656eba38d56d39dfe0c065f4f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142108012) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3864)
 
   * 1b23c2232e9717218a7c61c930c481cbcf2e6f2e Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/142214794) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3888)
 
   * 063b5e87dcef4a363f01a48e4af4fb9d3670429f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142218414) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3890)
 
   * d5dfd65a163634584e8eaeee452d5454b2d4fe45 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/143576795) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4193)
 
   * d1f5b89012dc266fc0c664085c1a2aef0c8b95ec Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/143669093) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4204)
 
   * f61c52bcb1420572c2ad94e3d2f1caafbf7f6081 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/143708993) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4215)
 
   * 9ea2c427c8f7046d78c122eea8f2d0c10c200224 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144479909) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4362)
 
   * fd4f362ceed0a4785c449c807cb46dac15f712ea Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148094252) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4978)
 
   * 8e6f927d76fed13e4c095619dee0d72e6a27359f Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148125056) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4983)
 
   * 12a890c54ecc602a3d46e07e8ded4263469c Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148185789) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5005)
 
   * 1a8f90917d5fe5b928f4f70bad62a06f1ef9e3fa Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148207317) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5015)
 
   * d0f2df460885f14ad4d4f0331bde5ff2de9664ec Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148235635) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5018)
 
   * 97bdd27325be54d1ebfcdc2a4bb21b31d910d039 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148335745) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5040)
 
   * 4a56e798775f7f320a3e7b1436d28b00392caa09 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148395560) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5068)
 
   * 564913126f4ac833b52f587e2ccd9de2183dfd5f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148509000) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5081)
 
   * b2951a6743c376d92371d6eaf3f9be7048a3c29c Travis: 
[PENDING](https://travis-ci.com/flink-ci/flink/builds/149072760) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5193)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (FLINK-16068) table with keyword-escaped columns and computed_column_expression columns

2020-02-14 Thread pangliang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pangliang updated FLINK-16068:
--
Description: 
I use sql-client to create a table with keyword-escaped column and 
computed_column_expression column, like this:
{code:java}
CREATE TABLE source_kafka (
log STRING,
`time` BIGINT,
pt as proctime()
) WITH (
  'connector.type' = 'kafka',   
  'connector.version' = 'universal',
  'connector.topic' = 'k8s-logs',
  'connector.startup-mode' = 'latest-offset',
  'connector.properties.zookeeper.connect' = 
'zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181/kafka',
  'connector.properties.bootstrap.servers' = 'kafka.default:9092',
  'connector.properties.group.id' = 'testGroup',
  'format.type'='json',
  'format.fail-on-missing-field' = 'true',
  'update-mode' = 'append'
);
{code}
Then I simply used it :
{code:java}
SELECT * from source_kafka limit 10;{code}
got an exception:
{code:java}
java.io.IOException: Fail to run stream sql job
at 
org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:164)
at 
org.apache.zeppelin.flink.FlinkStreamSqlInterpreter.callSelect(FlinkStreamSqlInterpreter.java:108)
at 
org.apache.zeppelin.flink.FlinkSqlInterrpeter.callCommand(FlinkSqlInterrpeter.java:203)
at 
org.apache.zeppelin.flink.FlinkSqlInterrpeter.runSqlList(FlinkSqlInterrpeter.java:151)
at 
org.apache.zeppelin.flink.FlinkSqlInterrpeter.interpret(FlinkSqlInterrpeter.java:104)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:676)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:569)
at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
at 
org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121)
at 
org.apache.zeppelin.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:39)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.table.api.SqlParserException: SQL parse failed. 
Encountered "time" at line 1, column 12.
Was expecting one of:
"ABS" ...
"ARRAY" ...
"AVG" ...
"CARDINALITY" ...
"CASE" ...
"CAST" ...
"CEIL" ...
"CEILING" ...
..

at 
org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:50)
at 
org.apache.flink.table.planner.calcite.SqlExprToRexConverterImpl.convertToRexNodes(SqlExprToRexConverterImpl.java:79)
at 
org.apache.flink.table.planner.plan.schema.CatalogSourceTable.toRel(CatalogSourceTable.scala:111)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.toRel(SqlToRelConverter.java:3328)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2357)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2051)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2005)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:646)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563)
at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:148)
at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:135)
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.toQueryOperation(SqlToOperationConverter.java:522)
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlQuery(SqlToOperationConverter.java:436)
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:154)
at 
org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:66)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:464)
at 
org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:104)
... 13 more
{code}
I also did some tests, the following can run:
{code:java}
CREATE TABLE source_kafka (
log STRING,
`a` BIGINT,
pt as proctime()
)

CREATE TABLE source_kafka (
log STRING,

[jira] [Created] (FLINK-16068) table with keyword-escaped columns and computed_column_expression columns

2020-02-14 Thread pangliang (Jira)

pangliang created FLINK-16068:
-

 Summary: table with keyword-escaped columns and 
computed_column_expression columns
 Key: FLINK-16068
 URL: https://issues.apache.org/jira/browse/FLINK-16068
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.10.0
Reporter: pangliang


I use sql-client to create a table with keyword-escaped columns and 
computed_column_expression columns, like this:
{code:java}
CREATE TABLE source_kafka (
log STRING,
`time` BIGINT,
pt as proctime()
) WITH (
  'connector.type' = 'kafka',   
  'connector.version' = 'universal',
  'connector.topic' = 'k8s-logs',
  'connector.startup-mode' = 'latest-offset',
  'connector.properties.zookeeper.connect' = 
'zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181/kafka',
  'connector.properties.bootstrap.servers' = 'kafka.default:9092',
  'connector.properties.group.id' = 'testGroup',
  'format.type'='json',
  'format.fail-on-missing-field' = 'true',
  'update-mode' = 'append'
);
{code}
Then I simply used it :
{code:java}
SELECT * from source_kafka limit 10;{code}
got an exception:
{code:java}
java.io.IOException: Fail to run stream sql job
at 
org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:164)
at 
org.apache.zeppelin.flink.FlinkStreamSqlInterpreter.callSelect(FlinkStreamSqlInterpreter.java:108)
at 
org.apache.zeppelin.flink.FlinkSqlInterrpeter.callCommand(FlinkSqlInterrpeter.java:203)
at 
org.apache.zeppelin.flink.FlinkSqlInterrpeter.runSqlList(FlinkSqlInterrpeter.java:151)
at 
org.apache.zeppelin.flink.FlinkSqlInterrpeter.interpret(FlinkSqlInterrpeter.java:104)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:676)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:569)
at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
at 
org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121)
at 
org.apache.zeppelin.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:39)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.table.api.SqlParserException: SQL parse failed. 
Encountered "time" at line 1, column 12.
Was expecting one of:
"ABS" ...
"ARRAY" ...
"AVG" ...
"CARDINALITY" ...
"CASE" ...
"CAST" ...
"CEIL" ...
"CEILING" ...
..

at 
org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:50)
at 
org.apache.flink.table.planner.calcite.SqlExprToRexConverterImpl.convertToRexNodes(SqlExprToRexConverterImpl.java:79)
at 
org.apache.flink.table.planner.plan.schema.CatalogSourceTable.toRel(CatalogSourceTable.scala:111)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.toRel(SqlToRelConverter.java:3328)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2357)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2051)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2005)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:646)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563)
at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:148)
at 
org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:135)
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.toQueryOperation(SqlToOperationConverter.java:522)
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlQuery(SqlToOperationConverter.java:436)
at 
org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:154)
at 
org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:66)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:464)
at 
org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:104)

[GitHub] [flink] flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending 
processing-time timers to make endInput semantics on the operator chain strict
URL: https://github.com/apache/flink/pull/10151#issuecomment-552442468
 
 
   
   ## CI report:
   
   * c6d7f5e864076448dca590035a6a590dc5e25c44 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/135928709) 
   * 682da0aec5dee14c09583468d15115e2a512c827 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/140139283) 
   * 9c66fbe1e4d81c3656eba38d56d39dfe0c065f4f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142108012) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3864)
 
   * 1b23c2232e9717218a7c61c930c481cbcf2e6f2e Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/142214794) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3888)
 
   * 063b5e87dcef4a363f01a48e4af4fb9d3670429f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142218414) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3890)
 
   * d5dfd65a163634584e8eaeee452d5454b2d4fe45 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/143576795) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4193)
 
   * d1f5b89012dc266fc0c664085c1a2aef0c8b95ec Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/143669093) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4204)
 
   * f61c52bcb1420572c2ad94e3d2f1caafbf7f6081 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/143708993) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4215)
 
   * 9ea2c427c8f7046d78c122eea8f2d0c10c200224 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144479909) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4362)
 
   * fd4f362ceed0a4785c449c807cb46dac15f712ea Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148094252) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4978)
 
   * 8e6f927d76fed13e4c095619dee0d72e6a27359f Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148125056) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4983)
 
   * 12a890c54ecc602a3d46e07e8ded4263469c Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148185789) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5005)
 
   * 1a8f90917d5fe5b928f4f70bad62a06f1ef9e3fa Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148207317) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5015)
 
   * d0f2df460885f14ad4d4f0331bde5ff2de9664ec Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148235635) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5018)
 
   * 97bdd27325be54d1ebfcdc2a4bb21b31d910d039 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148335745) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5040)
 
   * 4a56e798775f7f320a3e7b1436d28b00392caa09 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148395560) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5068)
 
   * 564913126f4ac833b52f587e2ccd9de2183dfd5f Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148509000) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5081)
 
   * b2951a6743c376d92371d6eaf3f9be7048a3c29c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt

2020-02-14 Thread GitBox

hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] 
Include flink-ml-api and flink-ml-lib in opt
URL: https://github.com/apache/flink/pull/10995#discussion_r379713878
 
 

 ##
 File path: flink-ml-parent/flink-ml-lib/pom.xml
 ##
 @@ -57,4 +57,30 @@ under the License.
1.1.2


+
+   
+   
+   
+   
+   org.apache.maven.plugins
+   maven-shade-plugin
+   
+   
+   shade-flink
+   package
+   
+   shade
+   
+   
+   
+   
+   
com.github.fommil.netlib:core
 
 Review comment:
   @zentol The license has already been added by this commit 
https://github.com/apache/flink/pull/8631/files
   
   Or do you mean we should remove the licensing for release-1.10 and 
release-1.9? I think we can create a fix in another PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt

2020-02-14 Thread GitBox

hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] 
Include flink-ml-api and flink-ml-lib in opt
URL: https://github.com/apache/flink/pull/10995#discussion_r379713304
 
 

 ##
 File path: flink-dist/src/main/assemblies/opt.xml
 ##
 @@ -168,6 +168,14 @@

flink-python_${scala.binary.version}-${project.version}.jar
0644

+
+   
+   
 
 Review comment:
   Hi @dianfu @zentol Thanks for your advice.
   I would fine with adding an uber module. But I think there are some 
differences between the table module and the ml module. For table module, it 
needs a uber module to packages all table sub-modules plus other modules, e.g., 
cep. While for ml, there are no such requirements. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunhaibotb commented on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict

2020-02-14 Thread GitBox

sunhaibotb commented on issue #10151: [FLINK-14231] Handle the pending 
processing-time timers to make endInput semantics on the operator chain strict
URL: https://github.com/apache/flink/pull/10151#issuecomment-586546096
 
 
   Latest update to resolve conflicts with master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix 
CFRO creation in tests
URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012
 
 
   
   ## CI report:
   
   * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181)
 
   * f65641913b8e78475a174d4c45fcf6720324e894 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191)
 
   * 34ac6e03b86281ba56a4108f0f713f7a769b1291 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149041446) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5192)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix 
CFRO creation in tests
URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012
 
 
   
   ## CI report:
   
   * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181)
 
   * f65641913b8e78475a174d4c45fcf6720324e894 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191)
 
   * 34ac6e03b86281ba56a4108f0f713f7a769b1291 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149041446) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5192)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #11052: FLINK-15975 Use LinkedHashMap for deterministic iterations

2020-02-14 Thread GitBox

bowenli86 commented on issue #11052: FLINK-15975 Use LinkedHashMap for 
deterministic iterations
URL: https://github.com/apache/flink/pull/11052#issuecomment-586484297
 
 
   can you rename the ticket and this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix 
CFRO creation in tests
URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012
 
 
   
   ## CI report:
   
   * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181)
 
   * f65641913b8e78475a174d4c45fcf6720324e894 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191)
 
   * 34ac6e03b86281ba56a4108f0f713f7a769b1291 Travis: 
[PENDING](https://travis-ci.com/flink-ci/flink/builds/149041446) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5192)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix 
CFRO creation in tests
URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012
 
 
   
   ## CI report:
   
   * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181)
 
   * f65641913b8e78475a174d4c45fcf6720324e894 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191)
 
   * 34ac6e03b86281ba56a4108f0f713f7a769b1291 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix 
CFRO creation in tests
URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012
 
 
   
   ## CI report:
   
   * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181)
 
   * f65641913b8e78475a174d4c45fcf6720324e894 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379569902
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379575432
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379572141
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379560406
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
 
 Review comment:
   As far as I remember, this was supposed to be about the conceptual 
differences between state handling in batch and stream processing. I don't 
remember exactly, what we talked about here. The only thing I can think of 
right now is, that the data structures in which state is stored need to be 
different in batch and stream processing for it to be efficient.
   
   I guess one could also say something about boundedness and expiry of state?
   
   @StephanEwen what did we have in mind here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379554124
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
 
 Review comment:
   is the sequence of events the state or does the state store the sequence of 
events? I would say the former. Same for the examples below.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379575236
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379587916
 
 

 ##
 File path: docs/dev/stream/state/state.md
 ##
 @@ -22,66 +22,17 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-This document explains how to use Flink's state abstractions when developing 
an application.
+In this section you will learn about the stateful abstractions that Flink
 
 Review comment:
   see above


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379586538
 
 

 ##
 File path: docs/concepts/timely-stream-processing.md
 ##
 @@ -0,0 +1,237 @@
+---
+title: Timely Stream Processing
+nav-id: timely-stream-processing
+nav-pos: 3
+nav-title: Timely Stream Processing
+nav-parent_id: concepts
+---
+
+
+`TODO: add introduction`
+
+* This will be replaced by the TOC
+{:toc}
+
+## Latency & Completeness
+
+`TODO: add these two sections`
+
+### Latency vs. Completeness in Batch & Stream Processing
+
+{% top %}
+
+## Event Time, Processing Time, and Ingestion Time
+
+When referring to time in a streaming program (for example to define windows),
+one can refer to different notions of *time*:
+
+- **Processing time:** Processing time refers to the system time of the machine
+  that is executing the respective operation.
+
+  When a streaming program runs on processing time, all time-based operations
+  (like time windows) will use the system clock of the machines that run the
+  respective operator. An hourly processing time window will include all
+  records that arrived at a specific operator between the times when the system
+  clock indicated the full hour. For example, if an application begins running
+  at 9:15am, the first hourly processing time window will include events
+  processed between 9:15am and 10:00am, the next window will include events
+  processed between 10:00am and 11:00am, and so on.
+
+  Processing time is the simplest notion of time and requires no coordination
+  between streams and machines.  It provides the best performance and the
+  lowest latency. However, in distributed and asynchronous environments
+  processing time does not provide determinism, because it is susceptible to
+  the speed at which records arrive in the system (for example from the message
+  queue), to the speed at which the records flow between operators inside the
+  system, and to outages (scheduled, or otherwise).
+
+- **Event time:** Event time is the time that each individual event occurred on
+  its producing device.  This time is typically embedded within the records
+  before they enter Flink, and that *event timestamp* can be extracted from
+  each record. In event time, the progress of time depends on the data, not on
+  any wall clocks. Event time programs must specify how to generate *Event Time
+  Watermarks*, which is the mechanism that signals progress in event time. This
+  watermarking mechanism is described in a later section,
+  [below](#event-time-and-watermarks).
+
+  In a perfect world, event time processing would yield completely consistent
+  and deterministic results, regardless of when events arrive, or their
+  ordering.  However, unless the events are known to arrive in-order (by
+  timestamp), event time processing incurs some latency while waiting for
+  out-of-order events. As it is only possible to wait for a finite period of
+  time, this places a limit on how deterministic event time applications can
+  be.
+
+  Assuming all of the data has arrived, event time operations will behave as
+  expected, and produce correct and consistent results even when working with
+  out-of-order or late events, or when reprocessing historic data. For example,
+  an hourly event time window will contain all records that carry an event
+  timestamp that falls into that hour, regardless of the order in which they
+  arrive, or when they are processed. (See the section on [late
+  events](#late-elements) for more information.)
+
+
+
+  Note that sometimes when event time programs are processing live data in
+  real-time, they will use some *processing time* operations in order to
+  guarantee that they are progressing in a timely fashion.
+
+- **Ingestion time:** Ingestion time is the time that events enter Flink. At
+  the source operator each record gets the source's current time as a
+  timestamp, and time-based operations (like time windows) refer to that
+  timestamp.
+
+  *Ingestion time* sits conceptually in between *event time* and *processing
+  time*. Compared to *processing time*, it is slightly more expensive, but
+  gives more predictable results. Because *ingestion time* uses stable
+  timestamps (assigned once at the source), different window operations over
+  the records will refer to the same timestamp, whereas in *processing time*
+  each window operator may assign the record to a different window (based on
+  the local system clock and any transport delay).
+
+  Compared to *event time*, *ingestion time* programs cannot handle any
+  out-of-order events or late data, but the programs don't have to specify how
+  to generate *watermarks*.
+
+  Internally, *ingestion time* is treated much like *event time*, but with
+  automatic timestamp assignment and automatic watermark generation.
+
+
+
+{% top %}
+

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379562658
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
 
 Review comment:
   I think, this introduction should already mention the concept and importance 
of state locality. Maybe with the typical figure of two-tiered architecture to 
state and logic fused into one thing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379554299
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
 
 Review comment:
   ```suggestion
   Flink needs to be aware of the state in order to it fault tolerant
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379583881
 
 

 ##
 File path: docs/concepts/flink-architecture.md
 ##
 @@ -0,0 +1,140 @@
+---
+title: Flink Architecture
+nav-id: flink-architecture
+nav-pos: 4
+nav-title: Flink Architecture
+nav-parent_id: concepts
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+## Flink Applications and Flink Sessions
+
+`TODO: expand this section`
+
+{% top %}
+
+## Anatomy of a Flink Cluster
+
+`TODO: expand this section, especially about components of the Flink Master and
+container environments`
+
+The Flink runtime consists of two types of processes:
+
+  - The *Flink Master* coordinates the distributed execution. It schedules
+tasks, coordinates checkpoints, coordinates recovery on failures, etc.
+
+There is always at least one *Flink Master*. A high-availability setup will
+have multiple *Flink Masters*, one of which one is always the *leader*, and
+the others are *standby*.
+
+  - The *TaskManagers* (also called *workers*) execute the *tasks* (or more
+specifically, the subtasks) of a dataflow, and buffer and exchange the data
+*streams*.
+
+There must always be at least one TaskManager.
+
+The Flink Master and TaskManagers can be started in various ways: directly on
+the machines as a [standalone cluster]({{ site.baseurl }}{% link
+ops/deployment/cluster_setup.md %}), in containers, or managed by resource
+frameworks like [YARN]({{ site.baseurl }}{% link ops/deployment/yarn_setup.md
+%}) or [Mesos]({{ site.baseurl }}{% link ops/deployment/mesos.md %}).
+TaskManagers connect to Flink Masters, announcing themselves as available, and
+are assigned work.
+
+The *client* is not part of the runtime and program execution, but is used to
+prepare and send a dataflow to the Flink Master.  After that, the client can
+disconnect, or stay connected to receive progress reports. The client runs
+either as part of the Java/Scala program that triggers the execution, or in the
+command line process `./bin/flink run ...`.
+
+
+
+{% top %}
+
+## Tasks and Operator Chains
+
+For distributed execution, Flink *chains* operator subtasks together into
+*tasks*. Each task is executed by one thread.  Chaining operators together into
+tasks is a useful optimization: it reduces the overhead of thread-to-thread
+handover and buffering, and increases overall throughput while decreasing
+latency.  The chaining behavior can be configured; see the [chaining docs]({{
+site.baseurl }}{% link dev/stream/operators/index.md
+%}#task-chaining-and-resource-groups) for details.
+
+The sample dataflow in the figure below is executed with five subtasks, and
+hence with five parallel threads.
+
+
+
+{% top %}
+
+## Task Slots and Resources
+
+Each worker (TaskManager) is a *JVM process*, and may execute one or more
+subtasks in separate threads.  To control how many tasks a worker accepts, a
+worker has so called **task slots** (at least one).
+
+Each *task slot* represents a fixed subset of resources of the TaskManager. A
 
 Review comment:
   There is already quite some operational content mixed in here. Could be 
shorter in the Concepts section in my opinion.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379547482
 
 

 ##
 File path: docs/concepts/stream-processing.md
 ##
 @@ -0,0 +1,96 @@
+---
+title: Stream Processing
+nav-id: stream-processing
+nav-pos: 1
+nav-title: Stream Processing
+nav-parent_id: concepts
+---
+
+
+`TODO: Add introduction`
+* This will be replaced by the TOC
+{:toc}
+
+## A Unified System for Batch & Stream Processing
+
+`TODO`
+
+{% top %}
+
+## Programs and Dataflows
+
+The basic building blocks of Flink programs are **streams** and
+**transformations**. Conceptually a *stream* is a (potentially never-ending)
+flow of data records, and a *transformation* is an operation that takes one or
+more streams as input, and produces one or more output streams as a result.
+
+When executed, Flink programs are mapped to **streaming dataflows**, consisting
+of **streams** and transformation **operators**. Each dataflow starts with one
+or more **sources** and ends in one or more **sinks**. The dataflows resemble
+arbitrary **directed acyclic graphs** *(DAGs)*. Although special forms of
+cycles are permitted via *iteration* constructs, for the most part we will
+gloss over this for simplicity.
+
+
+
+Often there is a one-to-one correspondence between the transformations in the
+programs and the operators in the dataflow. Sometimes, however, one
+transformation may consist of multiple transformation operators.
+
+{% top %}
+
+## Parallel Dataflows
+
+Programs in Flink are inherently parallel and distributed. During execution, a
+*stream* has one or more **stream partitions**, and each *operator* has one or
+more **operator subtasks**. The operator subtasks are independent of one
+another, and execute in different threads and possibly on different machines or
+containers.
+
+The number of operator subtasks is the **parallelism** of that particular
+operator. The parallelism of a stream is always that of its producing operator.
+Different operators of the same program may have different levels of
+parallelism.
+
+
 
 Review comment:
   I would update this figure. 
   
   Top: Logical (Data Flow) Graph 
   Bottom: Physical Graph
   
   In the bottom graph, all subtasks for an operator together are *not* an 
operator. It looks like it in the figure.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379584764
 
 

 ##
 File path: docs/concepts/flink-architecture.md
 ##
 @@ -0,0 +1,140 @@
+---
+title: Flink Architecture
+nav-id: flink-architecture
+nav-pos: 4
+nav-title: Flink Architecture
+nav-parent_id: concepts
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+## Flink Applications and Flink Sessions
+
+`TODO: expand this section`
+
+{% top %}
+
+## Anatomy of a Flink Cluster
+
+`TODO: expand this section, especially about components of the Flink Master and
+container environments`
+
+The Flink runtime consists of two types of processes:
+
+  - The *Flink Master* coordinates the distributed execution. It schedules
+tasks, coordinates checkpoints, coordinates recovery on failures, etc.
+
+There is always at least one *Flink Master*. A high-availability setup will
+have multiple *Flink Masters*, one of which one is always the *leader*, and
+the others are *standby*.
+
+  - The *TaskManagers* (also called *workers*) execute the *tasks* (or more
+specifically, the subtasks) of a dataflow, and buffer and exchange the data
+*streams*.
+
+There must always be at least one TaskManager.
+
+The Flink Master and TaskManagers can be started in various ways: directly on
+the machines as a [standalone cluster]({{ site.baseurl }}{% link
+ops/deployment/cluster_setup.md %}), in containers, or managed by resource
+frameworks like [YARN]({{ site.baseurl }}{% link ops/deployment/yarn_setup.md
+%}) or [Mesos]({{ site.baseurl }}{% link ops/deployment/mesos.md %}).
+TaskManagers connect to Flink Masters, announcing themselves as available, and
+are assigned work.
+
+The *client* is not part of the runtime and program execution, but is used to
+prepare and send a dataflow to the Flink Master.  After that, the client can
+disconnect, or stay connected to receive progress reports. The client runs
+either as part of the Java/Scala program that triggers the execution, or in the
+command line process `./bin/flink run ...`.
+
+
+
+{% top %}
+
+## Tasks and Operator Chains
+
+For distributed execution, Flink *chains* operator subtasks together into
+*tasks*. Each task is executed by one thread.  Chaining operators together into
+tasks is a useful optimization: it reduces the overhead of thread-to-thread
+handover and buffering, and increases overall throughput while decreasing
+latency.  The chaining behavior can be configured; see the [chaining docs]({{
+site.baseurl }}{% link dev/stream/operators/index.md
+%}#task-chaining-and-resource-groups) for details.
+
+The sample dataflow in the figure below is executed with five subtasks, and
+hence with five parallel threads.
+
+
+
+{% top %}
+
+## Task Slots and Resources
+
+Each worker (TaskManager) is a *JVM process*, and may execute one or more
+subtasks in separate threads.  To control how many tasks a worker accepts, a
+worker has so called **task slots** (at least one).
+
+Each *task slot* represents a fixed subset of resources of the TaskManager. A
+TaskManager with three slots, for example, will dedicate 1/3 of its managed
+memory to each slot. Slotting the resources means that a subtask will not
+compete with subtasks from other jobs for managed memory, but instead has a
+certain amount of reserved managed memory. Note that no CPU isolation happens
+here; currently slots only separate the managed memory of tasks.
+
+By adjusting the number of task slots, users can define how subtasks are
+isolated from each other.  Having one slot per TaskManager means each task
+group runs in a separate JVM (which can be started in a separate container, for
+example). Having multiple slots means more subtasks share the same JVM. Tasks
+in the same JVM share TCP connections (via multiplexing) and heartbeat
+messages. They may also share data sets and data structures, thus reducing the
+per-task overhead.
+
+
+
+By default, Flink allows subtasks to share slots even if they are subtasks of
+different tasks, so long as they are from the same job. The result is that one
+slot may hold an entire pipeline of the job. Allowing this *slot sharing* has
+two main benefits:
+
+  - A Flink cluster needs exactly as many task slots as the highest parallelism
+used in the job.  No need to calculate how many tasks (with varying
+parallelism) a program contains in total.
+
+  - It is easier to get better resource utilization. Without slot sharing, the
+non-intensive *source/map()* subtasks would block as many resources as the
+resource intensive *window* subtasks.  With slot sharing, increasing the
+base parallelism in our example from two to six yields full utilization of
+the slotted resources, while making sure that the heavy subtasks are fairly
+distributed among the TaskManagers.
+
+
+
+The APIs also include a

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379565697
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
 
 Review comment:
   I would not consider KeyGroups a conceptual topics, but rather an 
operational or internal. I might miss something though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379583328
 
 

 ##
 File path: docs/concepts/flink-architecture.md
 ##
 @@ -0,0 +1,140 @@
+---
+title: Flink Architecture
+nav-id: flink-architecture
+nav-pos: 4
+nav-title: Flink Architecture
+nav-parent_id: concepts
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+## Flink Applications and Flink Sessions
+
+`TODO: expand this section`
+
+{% top %}
+
+## Anatomy of a Flink Cluster
+
+`TODO: expand this section, especially about components of the Flink Master and
+container environments`
+
+The Flink runtime consists of two types of processes:
+
+  - The *Flink Master* coordinates the distributed execution. It schedules
+tasks, coordinates checkpoints, coordinates recovery on failures, etc.
+
+There is always at least one *Flink Master*. A high-availability setup will
+have multiple *Flink Masters*, one of which one is always the *leader*, and
+the others are *standby*.
+
+  - The *TaskManagers* (also called *workers*) execute the *tasks* (or more
+specifically, the subtasks) of a dataflow, and buffer and exchange the data
+*streams*.
+
+There must always be at least one TaskManager.
+
+The Flink Master and TaskManagers can be started in various ways: directly on
+the machines as a [standalone cluster]({{ site.baseurl }}{% link
+ops/deployment/cluster_setup.md %}), in containers, or managed by resource
+frameworks like [YARN]({{ site.baseurl }}{% link ops/deployment/yarn_setup.md
+%}) or [Mesos]({{ site.baseurl }}{% link ops/deployment/mesos.md %}).
+TaskManagers connect to Flink Masters, announcing themselves as available, and
+are assigned work.
+
+The *client* is not part of the runtime and program execution, but is used to
+prepare and send a dataflow to the Flink Master.  After that, the client can
+disconnect, or stay connected to receive progress reports. The client runs
+either as part of the Java/Scala program that triggers the execution, or in the
+command line process `./bin/flink run ...`.
+
+
+
+{% top %}
+
+## Tasks and Operator Chains
+
+For distributed execution, Flink *chains* operator subtasks together into
+*tasks*. Each task is executed by one thread.  Chaining operators together into
+tasks is a useful optimization: it reduces the overhead of thread-to-thread
+handover and buffering, and increases overall throughput while decreasing
+latency.  The chaining behavior can be configured; see the [chaining docs]({{
+site.baseurl }}{% link dev/stream/operators/index.md
+%}#task-chaining-and-resource-groups) for details.
+
+The sample dataflow in the figure below is executed with five subtasks, and
+hence with five parallel threads.
+
+
 
 Review comment:
   Top: Logical Graph, there are no tasks in the logical graph
   Bottom: Physical Graph


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379567637
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379573424
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379576013
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379569436
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379571150
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379587809
 
 

 ##
 File path: docs/dev/stream/state/index.md
 ##
 @@ -25,23 +25,10 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Stateful functions and operators store data across the processing of 
individual elements/events, making state a critical building block for
-any type of more elaborate operation.
-
-For example:
-
-  - When an application searches for certain event patterns, the state will 
store the sequence of events encountered so far.
-  - When aggregating events per minute/hour/day, the state holds the pending 
aggregates.
-  - When training a machine learning model over a stream of data points, the 
state holds the current version of the model parameters.
-  - When historic data needs to be managed, the state allows efficient access 
to events that occurred in the past.
-
-Flink needs to be aware of the state in order to make state fault tolerant 
using [checkpoints](checkpointing.html) and to allow [savepoints]({{ 
site.baseurl }}/ops/state/savepoints.html) of streaming applications.
-
-Knowledge about the state also allows for rescaling Flink applications, 
meaning that Flink takes care of redistributing state across parallel instances.
-
-The [queryable state](queryable_state.html) feature of Flink allows you to 
access state from outside of Flink during runtime.
-
-When working with state, it might also be useful to read about [Flink's state 
backends]({{ site.baseurl }}/ops/state/state_backends.html). Flink provides 
different state backends that specify how and where state is stored. State can 
be located on Java's heap or off-heap. Depending on your state backend, Flink 
can also *manage* the state for the application, meaning Flink deals with the 
memory management (possibly spilling to disk if necessary) to allow 
applications to hold very large state. State backends can be configured without 
changing your application logic.
+In this section you will learn about the stateful abstractions that Flink
 
 Review comment:
   are abstractions stateful?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379566326
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
 
 Review comment:
   ```suggestion
   operator instances when the parallelism is changed. There are different
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379546260
 
 

 ##
 File path: docs/concepts/stream-processing.md
 ##
 @@ -0,0 +1,96 @@
+---
+title: Stream Processing
+nav-id: stream-processing
+nav-pos: 1
+nav-title: Stream Processing
+nav-parent_id: concepts
+---
+
+
+`TODO: Add introduction`
+* This will be replaced by the TOC
+{:toc}
+
+## A Unified System for Batch & Stream Processing
+
+`TODO`
+
+{% top %}
+
+## Programs and Dataflows
+
+The basic building blocks of Flink programs are **streams** and
+**transformations**. Conceptually a *stream* is a (potentially never-ending)
+flow of data records, and a *transformation* is an operation that takes one or
+more streams as input, and produces one or more output streams as a result.
+
+When executed, Flink programs are mapped to **streaming dataflows**, consisting
+of **streams** and transformation **operators**. Each dataflow starts with one
+or more **sources** and ends in one or more **sinks**. The dataflows resemble
+arbitrary **directed acyclic graphs** *(DAGs)*. Although special forms of
+cycles are permitted via *iteration* constructs, for the most part we will
+gloss over this for simplicity.
+
+
+
+Often there is a one-to-one correspondence between the transformations in the
+programs and the operators in the dataflow. Sometimes, however, one
+transformation may consist of multiple transformation operators.
 
 Review comment:
   ```suggestion
   transformation may consist of multiple operators.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379568312
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379582165
 
 

 ##
 File path: docs/concepts/flink-architecture.md
 ##
 @@ -0,0 +1,140 @@
+---
+title: Flink Architecture
+nav-id: flink-architecture
+nav-pos: 4
+nav-title: Flink Architecture
+nav-parent_id: concepts
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+## Flink Applications and Flink Sessions
+
+`TODO: expand this section`
+
+{% top %}
+
+## Anatomy of a Flink Cluster
+
+`TODO: expand this section, especially about components of the Flink Master and
+container environments`
+
+The Flink runtime consists of two types of processes:
+
+  - The *Flink Master* coordinates the distributed execution. It schedules
+tasks, coordinates checkpoints, coordinates recovery on failures, etc.
+
+There is always at least one *Flink Master*. A high-availability setup will
 
 Review comment:
   ```suggestion
   There is always at least one *Flink Master*. A high-availability setup 
might
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379556110
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
 
 Review comment:
   Queryable State allows your to...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379570309
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379576154
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379581742
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379546166
 
 

 ##
 File path: docs/concepts/stream-processing.md
 ##
 @@ -0,0 +1,96 @@
+---
+title: Stream Processing
+nav-id: stream-processing
+nav-pos: 1
+nav-title: Stream Processing
+nav-parent_id: concepts
+---
+
+
+`TODO: Add introduction`
+* This will be replaced by the TOC
+{:toc}
+
+## A Unified System for Batch & Stream Processing
+
+`TODO`
+
+{% top %}
+
+## Programs and Dataflows
+
+The basic building blocks of Flink programs are **streams** and
+**transformations**. Conceptually a *stream* is a (potentially never-ending)
+flow of data records, and a *transformation* is an operation that takes one or
+more streams as input, and produces one or more output streams as a result.
+
+When executed, Flink programs are mapped to **streaming dataflows**, consisting
+of **streams** and transformation **operators**. Each dataflow starts with one
+or more **sources** and ends in one or more **sinks**. The dataflows resemble
+arbitrary **directed acyclic graphs** *(DAGs)*. Although special forms of
+cycles are permitted via *iteration* constructs, for the most part we will
+gloss over this for simplicity.
+
+
+
+Often there is a one-to-one correspondence between the transformations in the
+programs and the operators in the dataflow. Sometimes, however, one
 
 Review comment:
   ```suggestion
   programs and the operators in the logical dataflow graph. Sometimes, 
however, one
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379552116
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
 
 Review comment:
   Operations vs Operator vs Function vs Transformation?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379571692
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379550966
 
 

 ##
 File path: docs/concepts/stream-processing.md
 ##
 @@ -0,0 +1,96 @@
+---
+title: Stream Processing
+nav-id: stream-processing
+nav-pos: 1
+nav-title: Stream Processing
+nav-parent_id: concepts
+---
+
+
+`TODO: Add introduction`
+* This will be replaced by the TOC
+{:toc}
+
+## A Unified System for Batch & Stream Processing
+
+`TODO`
+
+{% top %}
+
+## Programs and Dataflows
+
+The basic building blocks of Flink programs are **streams** and
+**transformations**. Conceptually a *stream* is a (potentially never-ending)
+flow of data records, and a *transformation* is an operation that takes one or
+more streams as input, and produces one or more output streams as a result.
+
+When executed, Flink programs are mapped to **streaming dataflows**, consisting
+of **streams** and transformation **operators**. Each dataflow starts with one
+or more **sources** and ends in one or more **sinks**. The dataflows resemble
+arbitrary **directed acyclic graphs** *(DAGs)*. Although special forms of
+cycles are permitted via *iteration* constructs, for the most part we will
+gloss over this for simplicity.
+
+
+
+Often there is a one-to-one correspondence between the transformations in the
+programs and the operators in the dataflow. Sometimes, however, one
+transformation may consist of multiple transformation operators.
+
+{% top %}
+
+## Parallel Dataflows
+
+Programs in Flink are inherently parallel and distributed. During execution, a
+*stream* has one or more **stream partitions**, and each *operator* has one or
+more **operator subtasks**. The operator subtasks are independent of one
+another, and execute in different threads and possibly on different machines or
+containers.
+
+The number of operator subtasks is the **parallelism** of that particular
+operator. The parallelism of a stream is always that of its producing operator.
+Different operators of the same program may have different levels of
+parallelism.
+
+
+
+Streams can transport data between two operators in a *one-to-one* (or
 
 Review comment:
   I think,  different redistribution patterns that Fabian it in his book is 
more to the point. I think it was: 
   
   * Forward
   * Broadcast
   * Random
   * Keyed
   
   IMHO the additional classification in "Redistributing" and "One-to-one" does 
not help. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379545948
 
 

 ##
 File path: docs/concepts/stream-processing.md
 ##
 @@ -0,0 +1,96 @@
+---
+title: Stream Processing
+nav-id: stream-processing
+nav-pos: 1
+nav-title: Stream Processing
+nav-parent_id: concepts
+---
+
+
+`TODO: Add introduction`
+* This will be replaced by the TOC
+{:toc}
+
+## A Unified System for Batch & Stream Processing
+
+`TODO`
+
+{% top %}
+
+## Programs and Dataflows
+
+The basic building blocks of Flink programs are **streams** and
+**transformations**. Conceptually a *stream* is a (potentially never-ending)
+flow of data records, and a *transformation* is an operation that takes one or
+more streams as input, and produces one or more output streams as a result.
+
+When executed, Flink programs are mapped to **streaming dataflows**, consisting
+of **streams** and transformation **operators**. Each dataflow starts with one
+or more **sources** and ends in one or more **sinks**. The dataflows resemble
+arbitrary **directed acyclic graphs** *(DAGs)*. Although special forms of
+cycles are permitted via *iteration* constructs, for the most part we will
+gloss over this for simplicity.
+
+
 
 Review comment:
   In the glossary we call the "streaming dataflow" "logical graph" or 
"jobgraph". Might want to add this here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379566803
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
 
 Review comment:
   ```suggestion
   support use cases where records

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379576359
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379566028
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
 
 Review comment:
   parallel operator instance => sub-task?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379574304
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379582452
 
 

 ##
 File path: docs/concepts/flink-architecture.md
 ##
 @@ -0,0 +1,140 @@
+---
+title: Flink Architecture
+nav-id: flink-architecture
+nav-pos: 4
+nav-title: Flink Architecture
+nav-parent_id: concepts
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+## Flink Applications and Flink Sessions
+
+`TODO: expand this section`
+
+{% top %}
+
+## Anatomy of a Flink Cluster
+
+`TODO: expand this section, especially about components of the Flink Master and
+container environments`
+
+The Flink runtime consists of two types of processes:
+
+  - The *Flink Master* coordinates the distributed execution. It schedules
+tasks, coordinates checkpoints, coordinates recovery on failures, etc.
+
+There is always at least one *Flink Master*. A high-availability setup will
+have multiple *Flink Masters*, one of which one is always the *leader*, and
+the others are *standby*.
+
+  - The *TaskManagers* (also called *workers*) execute the *tasks* (or more
+specifically, the subtasks) of a dataflow, and buffer and exchange the data
 
 Review comment:
   according to the glossary "(or more specifically, the subtasks)" does not 
make sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379551776
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
 
 Review comment:
   ```suggestion
   across multiple events (for example window operators). These operations are
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379587424
 
 

 ##
 File path: docs/concepts/timely-stream-processing.md
 ##
 @@ -0,0 +1,237 @@
+---
+title: Timely Stream Processing
+nav-id: timely-stream-processing
+nav-pos: 3
+nav-title: Timely Stream Processing
+nav-parent_id: concepts
+---
+
+
+`TODO: add introduction`
+
+* This will be replaced by the TOC
+{:toc}
+
+## Latency & Completeness
+
+`TODO: add these two sections`
+
+### Latency vs. Completeness in Batch & Stream Processing
+
+{% top %}
+
+## Event Time, Processing Time, and Ingestion Time
+
+When referring to time in a streaming program (for example to define windows),
+one can refer to different notions of *time*:
+
+- **Processing time:** Processing time refers to the system time of the machine
+  that is executing the respective operation.
+
+  When a streaming program runs on processing time, all time-based operations
+  (like time windows) will use the system clock of the machines that run the
+  respective operator. An hourly processing time window will include all
+  records that arrived at a specific operator between the times when the system
+  clock indicated the full hour. For example, if an application begins running
+  at 9:15am, the first hourly processing time window will include events
+  processed between 9:15am and 10:00am, the next window will include events
+  processed between 10:00am and 11:00am, and so on.
+
+  Processing time is the simplest notion of time and requires no coordination
+  between streams and machines.  It provides the best performance and the
+  lowest latency. However, in distributed and asynchronous environments
+  processing time does not provide determinism, because it is susceptible to
+  the speed at which records arrive in the system (for example from the message
+  queue), to the speed at which the records flow between operators inside the
+  system, and to outages (scheduled, or otherwise).
+
+- **Event time:** Event time is the time that each individual event occurred on
+  its producing device.  This time is typically embedded within the records
+  before they enter Flink, and that *event timestamp* can be extracted from
+  each record. In event time, the progress of time depends on the data, not on
+  any wall clocks. Event time programs must specify how to generate *Event Time
+  Watermarks*, which is the mechanism that signals progress in event time. This
+  watermarking mechanism is described in a later section,
+  [below](#event-time-and-watermarks).
+
+  In a perfect world, event time processing would yield completely consistent
+  and deterministic results, regardless of when events arrive, or their
+  ordering.  However, unless the events are known to arrive in-order (by
+  timestamp), event time processing incurs some latency while waiting for
+  out-of-order events. As it is only possible to wait for a finite period of
+  time, this places a limit on how deterministic event time applications can
+  be.
+
+  Assuming all of the data has arrived, event time operations will behave as
+  expected, and produce correct and consistent results even when working with
+  out-of-order or late events, or when reprocessing historic data. For example,
+  an hourly event time window will contain all records that carry an event
+  timestamp that falls into that hour, regardless of the order in which they
+  arrive, or when they are processed. (See the section on [late
+  events](#late-elements) for more information.)
+
+
+
+  Note that sometimes when event time programs are processing live data in
+  real-time, they will use some *processing time* operations in order to
+  guarantee that they are progressing in a timely fashion.
+
+- **Ingestion time:** Ingestion time is the time that events enter Flink. At
+  the source operator each record gets the source's current time as a
+  timestamp, and time-based operations (like time windows) refer to that
+  timestamp.
+
+  *Ingestion time* sits conceptually in between *event time* and *processing
+  time*. Compared to *processing time*, it is slightly more expensive, but
+  gives more predictable results. Because *ingestion time* uses stable
+  timestamps (assigned once at the source), different window operations over
+  the records will refer to the same timestamp, whereas in *processing time*
+  each window operator may assign the record to a different window (based on
+  the local system clock and any transport delay).
+
+  Compared to *event time*, *ingestion time* programs cannot handle any
+  out-of-order events or late data, but the programs don't have to specify how
+  to generate *watermarks*.
+
+  Internally, *ingestion time* is treated much like *event time*, but with
+  automatic timestamp assignment and automatic watermark generation.
+
+
+
+{% top %}
+

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379571869
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379584150
 
 

 ##
 File path: docs/concepts/flink-architecture.md
 ##
 @@ -0,0 +1,140 @@
+---
+title: Flink Architecture
+nav-id: flink-architecture
+nav-pos: 4
+nav-title: Flink Architecture
+nav-parent_id: concepts
+---
+
+
+* This will be replaced by the TOC
+{:toc}
+
+## Flink Applications and Flink Sessions
+
+`TODO: expand this section`
+
+{% top %}
+
+## Anatomy of a Flink Cluster
+
+`TODO: expand this section, especially about components of the Flink Master and
+container environments`
+
+The Flink runtime consists of two types of processes:
+
+  - The *Flink Master* coordinates the distributed execution. It schedules
+tasks, coordinates checkpoints, coordinates recovery on failures, etc.
+
+There is always at least one *Flink Master*. A high-availability setup will
+have multiple *Flink Masters*, one of which one is always the *leader*, and
+the others are *standby*.
+
+  - The *TaskManagers* (also called *workers*) execute the *tasks* (or more
+specifically, the subtasks) of a dataflow, and buffer and exchange the data
+*streams*.
+
+There must always be at least one TaskManager.
+
+The Flink Master and TaskManagers can be started in various ways: directly on
+the machines as a [standalone cluster]({{ site.baseurl }}{% link
+ops/deployment/cluster_setup.md %}), in containers, or managed by resource
+frameworks like [YARN]({{ site.baseurl }}{% link ops/deployment/yarn_setup.md
+%}) or [Mesos]({{ site.baseurl }}{% link ops/deployment/mesos.md %}).
+TaskManagers connect to Flink Masters, announcing themselves as available, and
+are assigned work.
+
+The *client* is not part of the runtime and program execution, but is used to
+prepare and send a dataflow to the Flink Master.  After that, the client can
+disconnect, or stay connected to receive progress reports. The client runs
+either as part of the Java/Scala program that triggers the execution, or in the
+command line process `./bin/flink run ...`.
+
+
+
+{% top %}
+
+## Tasks and Operator Chains
+
+For distributed execution, Flink *chains* operator subtasks together into
+*tasks*. Each task is executed by one thread.  Chaining operators together into
+tasks is a useful optimization: it reduces the overhead of thread-to-thread
+handover and buffering, and increases overall throughput while decreasing
+latency.  The chaining behavior can be configured; see the [chaining docs]({{
+site.baseurl }}{% link dev/stream/operators/index.md
+%}#task-chaining-and-resource-groups) for details.
+
+The sample dataflow in the figure below is executed with five subtasks, and
+hence with five parallel threads.
+
+
+
+{% top %}
+
+## Task Slots and Resources
+
+Each worker (TaskManager) is a *JVM process*, and may execute one or more
+subtasks in separate threads.  To control how many tasks a worker accepts, a
+worker has so called **task slots** (at least one).
+
+Each *task slot* represents a fixed subset of resources of the TaskManager. A
+TaskManager with three slots, for example, will dedicate 1/3 of its managed
+memory to each slot. Slotting the resources means that a subtask will not
+compete with subtasks from other jobs for managed memory, but instead has a
+certain amount of reserved managed memory. Note that no CPU isolation happens
+here; currently slots only separate the managed memory of tasks.
+
+By adjusting the number of task slots, users can define how subtasks are
+isolated from each other.  Having one slot per TaskManager means each task
+group runs in a separate JVM (which can be started in a separate container, for
+example). Having multiple slots means more subtasks share the same JVM. Tasks
+in the same JVM share TCP connections (via multiplexing) and heartbeat
+messages. They may also share data sets and data structures, thus reducing the
+per-task overhead.
+
+
+
+By default, Flink allows subtasks to share slots even if they are subtasks of
+different tasks, so long as they are from the same job. The result is that one
 
 Review comment:
   ```suggestion
   different operators, so long as they are from the same job. The result is 
that one
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379558964
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
 
 Review comment:
   I would remove this section completely. It tries to cover too much to early. 
There is a section about Statebackends where this can be explained in more 
detail.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379569140
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379553632
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
 
 Review comment:
   This paragraph feels repetitive. Maybe mention that while the output of a 
stateless function only depends on the input, the output of a stateful function 
depends on the input as well as its current state. As the state is a function 
full history of events, the output of a stateful function can depend on the the 
input as well as all previous inputs.
   
   Maybe this is too theoretical or mathematica. It is a slightly different 
view on it though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379561027
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
 
 Review comment:
   ```suggestion
   state is only possible on *keyed streams*, i.e. after a keyed partitioned 
data exchange, and is
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379574851
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379570828
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379567553
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
+%}) of streaming applications.
+
+Knowledge about the state also allows for rescaling Flink applications, meaning
+that Flink takes care of redistributing state across parallel instances.
+
+The [queryable state]({{ site.baseurl }}{% link
+dev/stream/state/queryable_state.md %}) feature of Flink allows you to access
+state from outside of Flink during runtime.
+
+When working with state, it might also be useful to read about [Flink's state
+backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink
+provides different state backends that specify how and where state is stored.
+State can be located on Java's heap or off-heap. Depending on your state
+backend, Flink can also *manage* the state for the application, meaning Flink
+deals with the memory management (possibly spilling to disk if necessary) to
+allow applications to hold very large state. State backends can be configured
+without changing your application logic.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What is State?
+
+`TODO: expand this section`
+
+There are different types of state in Flink, the most-used type of state is
+*Keyed State*. For special cases you can use *Operator State* and *Broadcast
+State*. *Broadcast State* is a special type of *Operator State*.
+
+{% top %}
+
+## State in Stream & Batch Processing
+
+`TODO: What is this section about? Do we even need it?`
+
+{% top %}
+
+## Keyed State
+
+Keyed state is maintained in what can be thought of as an embedded key/value
+store.  The state is partitioned and distributed strictly together with the
+streams that are read by the stateful operators. Hence, access to the key/value
+state is only possible on *keyed streams*, after a *keyBy()* function, and is
+restricted to the values associated with the current event's key. Aligning the
+keys of streams and state makes sure that all state updates are local
+operations, guaranteeing consistency without transaction overhead.  This
+alignment also allows Flink to redistribute the state and adjust the stream
+partitioning transparently.
+
+
+
+Keyed State is further organized into so-called *Key Groups*. Key Groups are
+the atomic unit by which Flink can redistribute Keyed State; there are exactly
+as many Key Groups as the defined maximum parallelism.  During execution each
+parallel instance of a keyed operator works with the keys for one or more Key
+Groups.
+
+`TODO: potentially leave out Operator State and Broadcast State from concepts 
documentation`
+
+## Operator State
+
+*Operator State* (or *non-keyed state*) is state that is is bound to one
+parallel operator instance.  The [Kafka Connector]({{ site.baseurl }}{% link
+dev/connectors/kafka.md %}) is a good motivating example for the use of
+Operator State in Flink. Each parallel instance of the Kafka consumer maintains
+a map of topic partitions and offsets as its Operator State.
+
+The Operator State interfaces support redistributing state among parallel
+operator instances when the parallelism is changed. There can be different
+schemes for doing this redistribution.
+
+## Broadcast State
+
+*Broadcast State* is a special type of *Operator State*.  It was introduced to
+support use cases where some data coming from one stream is required to be
+broadcasted to all downstream tasks, where it is stored locally and

[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

knaufk commented on a change in pull request #11092: [FLINK-15999] Extract 
“Concepts” material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#discussion_r379555751
 
 

 ##
 File path: docs/concepts/stateful-stream-processing.md
 ##
 @@ -0,0 +1,412 @@
+---
+title: Stateful Stream Processing
+nav-id: stateful-stream-processing
+nav-pos: 2
+nav-title: Stateful Stream Processing
+nav-parent_id: concepts
+---
+
+
+While many operations in a dataflow simply look at one individual *event at a
+time* (for example an event parser), some operations remember information
+across multiple events (for example window operators).  These operations are
+called **stateful**.
+
+Stateful functions and operators store data across the processing of individual
+elements/events, making state a critical building block for any type of more
+elaborate operation.
+
+For example:
+
+  - When an application searches for certain event patterns, the state will
+store the sequence of events encountered so far.
+  - When aggregating events per minute/hour/day, the state holds the pending
+aggregates.
+  - When training a machine learning model over a stream of data points, the
+state holds the current version of the model parameters.
+  - When historic data needs to be managed, the state allows efficient access
+to events that occurred in the past.
+
+Flink needs to be aware of the state in order to make state fault tolerant
+using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md
+%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md
 
 Review comment:
   "Flink needs to be aware of the state in order to allow savepoints." does 
not make sense to me."


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11095: [FLINK-10918][rocks-db-backend] Fix incremental checkpoints on Windows.

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11095: [FLINK-10918][rocks-db-backend] Fix 
incremental checkpoints on Windows.
URL: https://github.com/apache/flink/pull/11095#issuecomment-586287248
 
 
   
   ## CI report:
   
   * 1529c3a8abb5e2b758e6a80e2f7359288f87e3d7 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148979015) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5179)
 
   * 0412a4320a08135f4f6c1f580e6eb60b7f763551 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149011380) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5190)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix 
CFRO creation in tests
URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012
 
 
   
   ## CI report:
   
   * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181)
 
   * f65641913b8e78475a174d4c45fcf6720324e894 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] StephanEwen commented on issue #11095: [FLINK-10918][rocks-db-backend] Fix incremental checkpoints on Windows.

2020-02-14 Thread GitBox

StephanEwen commented on issue #11095: [FLINK-10918][rocks-db-backend] Fix 
incremental checkpoints on Windows.
URL: https://github.com/apache/flink/pull/11095#issuecomment-586405488
 
 
   @carp84 @Myasuka @klion26 Let me know if you are interested in reviewing 
this. Otherwise I would merger this in the next days.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] StephanEwen commented on issue #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section

2020-02-14 Thread GitBox

StephanEwen commented on issue #11092: [FLINK-15999] Extract “Concepts” 
material from API/Library sections and start proper concepts section
URL: https://github.com/apache/flink/pull/11092#issuecomment-586403438
 
 
   Thanks, this is a great first step!
   
   Is the plan to merge this with the TODO comments? (I am personally fine with 
that)
   Or is that a base PR on which you want to base the next ones?
   
   A few minor suggestions:
 - I would remove the "Programming Model (outdated)" section.
 - Instead, having a staring page (maybe for concepts, maybe for the docs 
as a whole) that show the stack (SQL/Table API / DataStream API / StateFun, 
those on top of the common runtime)
   
 - I would drop operator state from the introduction. Instead I would 
mention it in the `DataStream API` as something that is mainly targeted towards 
connectors.
   
 - What do you think about removing ingestion time from the "timely" docs? 
We could change it to just a short mention under "event time", or mention it 
not at all.
   
 - I would also remove the "Asynchronous State Snapshots" section and 
simply have a comment somewhere that all persistence operations are 
asynchronous.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (FLINK-15956) Introduce an HttpRemoteFunction

2020-02-14 Thread Igal Shilman (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037155#comment-17037155
 ] 

Igal Shilman commented on FLINK-15956:
--

The initial version of the Http remote function is implemented and merged to 
master (see the attached PR for details).


> Introduce an HttpRemoteFunction
> ---
>
> Key: FLINK-15956
> URL: https://issues.apache.org/jira/browse/FLINK-15956
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Major
>  Labels: pull-request-available
> Fix For: statefun-1.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue introduces the remote HttpFunction based on a multi language 
> protocol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-15956) Introduce an HttpRemoteFunction

2020-02-14 Thread Igal Shilman (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman closed FLINK-15956.

Resolution: Fixed

> Introduce an HttpRemoteFunction
> ---
>
> Key: FLINK-15956
> URL: https://issues.apache.org/jira/browse/FLINK-15956
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Major
>  Labels: pull-request-available
> Fix For: statefun-1.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue introduces the remote HttpFunction based on a multi language 
> protocol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on issue #11099: [FLINK-16038][jdbc] Add hitRate metric for JDBCLookupFunction

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11099: [FLINK-16038][jdbc] Add hitRate 
metric for JDBCLookupFunction
URL: https://github.com/apache/flink/pull/11099#issuecomment-586328258
 
 
   
   ## CI report:
   
   * 09de912ae24b854486670dd6122ea62758055bae Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148994917) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5188)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests

2020-02-14 Thread GitBox

flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix 
CFRO creation in tests
URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012
 
 
   
   ## CI report:
   
   * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181)
 
   * f65641913b8e78475a174d4c45fcf6720324e894 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Closed] (FLINK-16066) StateBinder should pickup persisted values in inner class

2020-02-14 Thread Igal Shilman (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-16066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman closed FLINK-16066.

Fix Version/s: statefun-1.1
   Resolution: Fixed

Fixed at b81ec64ee72fb04ff8bf0fe2045fb92523959efe

> StateBinder should pickup persisted values in inner class
> -
>
> Key: FLINK-16066
> URL: https://issues.apache.org/jira/browse/FLINK-16066
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Major
>  Labels: pull-request-available
> Fix For: statefun-1.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently StateBinder would only bind PersistedValues/PersistedTables if they 
> are declared directly as a field in a StatefulFunction, but this prevents 
> reuse through composition.
> Ideally if I user annotates a field as Persisted, StateBinder should recurse 
> into that class and pick up all the PersistedStates/PersistedTables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-16062) Remove the use of a checkpointLock

2020-02-14 Thread Igal Shilman (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-16062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman closed FLINK-16062.

Fix Version/s: statefun-1.1
   Resolution: Fixed

Fixed at 3ac3cfd95e4cebc7e7fce1d3bf9d9262be5cc0c4.

> Remove the use of a checkpointLock
> --
>
> Key: FLINK-16062
> URL: https://issues.apache.org/jira/browse/FLINK-16062
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Major
>  Labels: pull-request-available
> Fix For: statefun-1.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since Flink 1.10 the checkpoint lock is deprecated and useless.
> Lets remove its usage from statefun. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-16061) Disable state multiplexing

2020-02-14 Thread Igal Shilman (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman closed FLINK-16061.

Fix Version/s: statefun-1.1
   Resolution: Fixed

Implemented at 11f88f588eb963c4dee2fe517c607093bcdb147b

> Disable state multiplexing
> --
>
> Key: FLINK-16061
> URL: https://issues.apache.org/jira/browse/FLINK-16061
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Major
> Fix For: statefun-1.1
>
>
> Since Flink 1.10 the cost of a RocksDB column family has reduced 
> substantially, there is no more reason to multiplex state in a single state 
> handle.
> So lets remove that code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-16020) Use Flink-1.10 released version instead of the snapshot version

2020-02-14 Thread Igal Shilman (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-16020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman closed FLINK-16020.

Fix Version/s: statefun-1.1
   Resolution: Fixed

Fixed at 9a1c1e8a3cbcec791161572401b8de8dca342c6c

> Use Flink-1.10 released version instead of the snapshot version
> ---
>
> Key: FLINK-16020
> URL: https://issues.apache.org/jira/browse/FLINK-16020
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Major
> Fix For: statefun-1.1
>
>
> Since Flink 1.10 was released, we can stop using Flink 1.10-SNAPSHOT,
> this include both maven and docker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 4 >

1 - 100 of 380 matches

Mail list logo