[GitHub] [flink] flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement working StreamMultipleInputProcessor
flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement working StreamMultipleInputProcessor URL: https://github.com/apache/flink/pull/11098#issuecomment-586328210 ## CI report: * df7f0151d94bc7705c87baf855ae3d8d57f7e463 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148994879) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5187) * 311d3e9843bd601a8de8bee78c2ecd34222d19d6 UNKNOWN * e72b2a38b7ec718b44a14f98897dbaf22f9d0d0d Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/149078552) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5196) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] carp84 commented on issue #11095: [FLINK-10918][rocks-db-backend] Fix incremental checkpoints on Windows.
carp84 commented on issue #11095: [FLINK-10918][rocks-db-backend] Fix incremental checkpoints on Windows. URL: https://github.com/apache/flink/pull/11095#issuecomment-586565223 Thanks for the note @StephanEwen, checking now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-16068) table with keyword-escaped columns and computed_column_expression columns
[ https://issues.apache.org/jira/browse/FLINK-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu updated FLINK-16068: Fix Version/s: 1.10.1 > table with keyword-escaped columns and computed_column_expression columns > - > > Key: FLINK-16068 > URL: https://issues.apache.org/jira/browse/FLINK-16068 > Project: Flink > Issue Type: Bug > Components: Table SQL / Client >Affects Versions: 1.10.0 >Reporter: pangliang >Priority: Major > Fix For: 1.10.1 > > > I use sql-client to create a table with keyword-escaped column and > computed_column_expression column, like this: > {code:java} > CREATE TABLE source_kafka ( > log STRING, > `time` BIGINT, > pt as proctime() > ) WITH ( > 'connector.type' = 'kafka', > 'connector.version' = 'universal', > 'connector.topic' = 'k8s-logs', > 'connector.startup-mode' = 'latest-offset', > 'connector.properties.zookeeper.connect' = > 'zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181/kafka', > 'connector.properties.bootstrap.servers' = 'kafka.default:9092', > 'connector.properties.group.id' = 'testGroup', > 'format.type'='json', > 'format.fail-on-missing-field' = 'true', > 'update-mode' = 'append' > ); > {code} > Then I simply used it : > {code:java} > SELECT * from source_kafka limit 10;{code} > got an exception: > {code:java} > java.io.IOException: Fail to run stream sql job > at > org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:164) > at > org.apache.zeppelin.flink.FlinkStreamSqlInterpreter.callSelect(FlinkStreamSqlInterpreter.java:108) > at > org.apache.zeppelin.flink.FlinkSqlInterrpeter.callCommand(FlinkSqlInterrpeter.java:203) > at > org.apache.zeppelin.flink.FlinkSqlInterrpeter.runSqlList(FlinkSqlInterrpeter.java:151) > at > org.apache.zeppelin.flink.FlinkSqlInterrpeter.interpret(FlinkSqlInterrpeter.java:104) > at > org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:676) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:569) > at org.apache.zeppelin.scheduler.Job.run(Job.java:172) > at > org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121) > at > org.apache.zeppelin.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:39) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.table.api.SqlParserException: SQL parse failed. > Encountered "time" at line 1, column 12. > Was expecting one of: > "ABS" ... > "ARRAY" ... > "AVG" ... > "CARDINALITY" ... > "CASE" ... > "CAST" ... > "CEIL" ... > "CEILING" ... > .. > > at > org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:50) > at > org.apache.flink.table.planner.calcite.SqlExprToRexConverterImpl.convertToRexNodes(SqlExprToRexConverterImpl.java:79) > at > org.apache.flink.table.planner.plan.schema.CatalogSourceTable.toRel(CatalogSourceTable.scala:111) > at > org.apache.calcite.sql2rel.SqlToRelConverter.toRel(SqlToRelConverter.java:3328) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2357) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2051) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2005) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:646) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563) > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:148) > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:135) > at > org.apache.flink.table.planner.operations.SqlToOperationConverter.toQueryOperation(SqlToOperationConverter.java:522) > at > org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlQuery(SqlToOperationConverter.java:436) > at >
[jira] [Commented] (FLINK-16068) table with keyword-escaped columns and computed_column_expression columns
[ https://issues.apache.org/jira/browse/FLINK-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037445#comment-17037445 ] Jark Wu commented on FLINK-16068: - cc [~danny0405], could you help to take a look at this? It seems there is a bug in the SQL parser when the column name is a keyword and a computed column in the definition. > table with keyword-escaped columns and computed_column_expression columns > - > > Key: FLINK-16068 > URL: https://issues.apache.org/jira/browse/FLINK-16068 > Project: Flink > Issue Type: Bug > Components: Table SQL / Client >Affects Versions: 1.10.0 >Reporter: pangliang >Priority: Major > > I use sql-client to create a table with keyword-escaped column and > computed_column_expression column, like this: > {code:java} > CREATE TABLE source_kafka ( > log STRING, > `time` BIGINT, > pt as proctime() > ) WITH ( > 'connector.type' = 'kafka', > 'connector.version' = 'universal', > 'connector.topic' = 'k8s-logs', > 'connector.startup-mode' = 'latest-offset', > 'connector.properties.zookeeper.connect' = > 'zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181/kafka', > 'connector.properties.bootstrap.servers' = 'kafka.default:9092', > 'connector.properties.group.id' = 'testGroup', > 'format.type'='json', > 'format.fail-on-missing-field' = 'true', > 'update-mode' = 'append' > ); > {code} > Then I simply used it : > {code:java} > SELECT * from source_kafka limit 10;{code} > got an exception: > {code:java} > java.io.IOException: Fail to run stream sql job > at > org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:164) > at > org.apache.zeppelin.flink.FlinkStreamSqlInterpreter.callSelect(FlinkStreamSqlInterpreter.java:108) > at > org.apache.zeppelin.flink.FlinkSqlInterrpeter.callCommand(FlinkSqlInterrpeter.java:203) > at > org.apache.zeppelin.flink.FlinkSqlInterrpeter.runSqlList(FlinkSqlInterrpeter.java:151) > at > org.apache.zeppelin.flink.FlinkSqlInterrpeter.interpret(FlinkSqlInterrpeter.java:104) > at > org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:676) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:569) > at org.apache.zeppelin.scheduler.Job.run(Job.java:172) > at > org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121) > at > org.apache.zeppelin.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:39) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.table.api.SqlParserException: SQL parse failed. > Encountered "time" at line 1, column 12. > Was expecting one of: > "ABS" ... > "ARRAY" ... > "AVG" ... > "CARDINALITY" ... > "CASE" ... > "CAST" ... > "CEIL" ... > "CEILING" ... > .. > > at > org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:50) > at > org.apache.flink.table.planner.calcite.SqlExprToRexConverterImpl.convertToRexNodes(SqlExprToRexConverterImpl.java:79) > at > org.apache.flink.table.planner.plan.schema.CatalogSourceTable.toRel(CatalogSourceTable.scala:111) > at > org.apache.calcite.sql2rel.SqlToRelConverter.toRel(SqlToRelConverter.java:3328) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2357) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2051) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2005) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:646) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181) > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563) > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:148) > at > org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:135) > at >
[jira] [Closed] (FLINK-16052) Homebrew test failed with 1.10.0 dist package
[ https://issues.apache.org/jira/browse/FLINK-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li closed FLINK-16052. - Fix Version/s: (was: 1.10.1) (was: 1.11.0) Assignee: Yu Li Resolution: Not A Bug Homebrew PR merged thus closing the JIRA, remove fix version since this is not a Flink bug. Thanks all for the efforts! > Homebrew test failed with 1.10.0 dist package > - > > Key: FLINK-16052 > URL: https://issues.apache.org/jira/browse/FLINK-16052 > Project: Flink > Issue Type: Bug > Components: Deployment / Scripts >Affects Versions: 1.10.0 >Reporter: Yu Li >Assignee: Yu Li >Priority: Major > > After updating the Homebrew formula to 1.10.0 (in {{$(brew --repository > homebrew/core)}} directory) with patch of this > [PR|https://github.com/Homebrew/homebrew-core/pull/50110], executing `brew > install --build-from-source Formula/apache-flink.rb` and then `brew test > Formula/apache-flink.rb`, we could see below error: > {code:java} > [ERROR] Unexpected result: Error: Could not find or load main class > org.apache.flink.runtime.util.BashJavaUtils > [ERROR] The last line of the BashJavaUtils outputs is expected to be the > execution result, following the prefix 'BASH_JAVA_UTILS_EXEC_RESULT:' > Picked up _JAVA_OPTIONS: > -Djava.io.tmpdir=/private/tmp/apache-flink-test-20200214-33361-1jotper > -Duser.home=/Users/jueding/Library/Caches/Homebrew/java_cache > Error: Could not find or load main class > org.apache.flink.runtime.util.BashJavaUtils > [ERROR] Could not get JVM parameters properly. > Error: apache-flink: failed > Failed executing: > {code} > After a bisect checking on {{flink-dist/src/main/flink-bin/bin}} changes, > confirmed the above issue is related to FLINK-15488, but we will see new > errors like below after reverting FLINK-15488 (and FLINK-15519): > {code:java} > ==> /usr/local/Cellar/apache-flink/1.10.0/libexec/bin/start-cluster.sh > ==> /usr/local/Cellar/apache-flink/1.10.0/bin/flink run -p 1 > /usr/local/Cellar/apache-flink/1.10.0/libexec/examples/streaming/WordCount.jar > --input input --output result > Last 15 lines from > /Users/jueding/Library/Logs/Homebrew/apache-flink/test.02.flink: > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > ... 4 more > Caused by: > org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: > Could not fulfill slot request b7f17c0928112209ae873d089123b1c6. Requested > resource profile (ResourceProfile{UNKNOWN}) is unfulfillable. > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.lambda$fulfillPendingSlotRequestWithPendingTaskManagerSlot$9(SlotManagerImpl.java:772) > at > org.apache.flink.util.OptionalConsumer.ifNotPresent(OptionalConsumer.java:52) > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.fulfillPendingSlotRequestWithPendingTaskManagerSlot(SlotManagerImpl.java:768) > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.lambda$internalRequestSlot$7(SlotManagerImpl.java:755) > at > org.apache.flink.util.OptionalConsumer.ifNotPresent(OptionalConsumer.java:52) > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.internalRequestSlot(SlotManagerImpl.java:755) > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.registerSlotRequest(SlotManagerImpl.java:314) > ... 27 more > Error: apache-flink: failed > Failed executing: > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement working StreamMultipleInputProcessor
flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement working StreamMultipleInputProcessor URL: https://github.com/apache/flink/pull/11098#issuecomment-586328210 ## CI report: * df7f0151d94bc7705c87baf855ae3d8d57f7e463 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148994879) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5187) * 311d3e9843bd601a8de8bee78c2ecd34222d19d6 UNKNOWN * e72b2a38b7ec718b44a14f98897dbaf22f9d0d0d UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] rkhachatryan commented on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests
rkhachatryan commented on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests URL: https://github.com/apache/flink/pull/11096#issuecomment-586562655 Thanks for feedback @AHeise, I've addressed the issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement working StreamMultipleInputProcessor
flinkbot edited a comment on issue #11098: [FLINK-16060][task] Implement working StreamMultipleInputProcessor URL: https://github.com/apache/flink/pull/11098#issuecomment-586328210 ## CI report: * df7f0151d94bc7705c87baf855ae3d8d57f7e463 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148994879) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5187) * 311d3e9843bd601a8de8bee78c2ecd34222d19d6 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt
flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt URL: https://github.com/apache/flink/pull/10995#issuecomment-581294934 ## CI report: * 65debfebc1bad89160a6375b9c78ea8eb063 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/147155607) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4764) * 1385f42a76ba0060d37022873064e736d52ea0af Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/147163976) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4766) * 9c1edd078c466ee56f2aed16ce9098e35bfae69e Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149075460) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5194) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-14802) Multi vectorized read version support for hive orc read
[ https://issues.apache.org/jira/browse/FLINK-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Li closed FLINK-14802. Release Note: Support vectorized read for Hive 1.x versions. Resolution: Fixed master: 63e9e9d200ef4f17097ae236be09d308efe8b72a > Multi vectorized read version support for hive orc read > --- > > Key: FLINK-14802 > URL: https://issues.apache.org/jira/browse/FLINK-14802 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive, Connectors / ORC >Reporter: Jingsong Lee >Assignee: Jingsong Lee >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The vector api of Hive 1.x version are totally different from hive 2+. > So we need introduce more efforts to support vectorized read for hive 1.x. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-14802) Multi vectorized read version support for hive orc read
[ https://issues.apache.org/jira/browse/FLINK-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Li updated FLINK-14802: - Release Note: Support vectorized read of ORC in Hive 1.x versions. (was: Support vectorized read for Hive 1.x versions.) > Multi vectorized read version support for hive orc read > --- > > Key: FLINK-14802 > URL: https://issues.apache.org/jira/browse/FLINK-14802 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive, Connectors / ORC >Reporter: Jingsong Lee >Assignee: Jingsong Lee >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The vector api of Hive 1.x version are totally different from hive 2+. > So we need introduce more efforts to support vectorized read for hive 1.x. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] bowenli86 commented on issue #10730: [FLINK-14802][orc][hive] Multi vectorized read version support for hive orc read
bowenli86 commented on issue #10730: [FLINK-14802][orc][hive] Multi vectorized read version support for hive orc read URL: https://github.com/apache/flink/pull/10730#issuecomment-586558519 @JingsongLi thanks for the great effort! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 closed pull request #10730: [FLINK-14802][orc][hive] Multi vectorized read version support for hive orc read
bowenli86 closed pull request #10730: [FLINK-14802][orc][hive] Multi vectorized read version support for hive orc read URL: https://github.com/apache/flink/pull/10730 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt
hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt URL: https://github.com/apache/flink/pull/10995#discussion_r379713878 ## File path: flink-ml-parent/flink-ml-lib/pom.xml ## @@ -57,4 +57,30 @@ under the License. 1.1.2 + + + + + + org.apache.maven.plugins + maven-shade-plugin + + + shade-flink + package + + shade + + + + + com.github.fommil.netlib:core Review comment: @zentol The license has already been added by this commit https://github.com/apache/flink/pull/8631/files Or do you mean we should remove the licensing for release-1.10? I think we can create a fix in another PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict
flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict URL: https://github.com/apache/flink/pull/10151#issuecomment-552442468 ## CI report: * c6d7f5e864076448dca590035a6a590dc5e25c44 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/135928709) * 682da0aec5dee14c09583468d15115e2a512c827 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/140139283) * 9c66fbe1e4d81c3656eba38d56d39dfe0c065f4f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142108012) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3864) * 1b23c2232e9717218a7c61c930c481cbcf2e6f2e Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/142214794) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3888) * 063b5e87dcef4a363f01a48e4af4fb9d3670429f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142218414) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3890) * d5dfd65a163634584e8eaeee452d5454b2d4fe45 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/143576795) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4193) * d1f5b89012dc266fc0c664085c1a2aef0c8b95ec Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/143669093) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4204) * f61c52bcb1420572c2ad94e3d2f1caafbf7f6081 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/143708993) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4215) * 9ea2c427c8f7046d78c122eea8f2d0c10c200224 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144479909) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4362) * fd4f362ceed0a4785c449c807cb46dac15f712ea Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148094252) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4978) * 8e6f927d76fed13e4c095619dee0d72e6a27359f Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148125056) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4983) * 12a890c54ecc602a3d46e07e8ded4263469c Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148185789) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5005) * 1a8f90917d5fe5b928f4f70bad62a06f1ef9e3fa Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148207317) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5015) * d0f2df460885f14ad4d4f0331bde5ff2de9664ec Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148235635) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5018) * 97bdd27325be54d1ebfcdc2a4bb21b31d910d039 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148335745) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5040) * 4a56e798775f7f320a3e7b1436d28b00392caa09 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148395560) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5068) * 564913126f4ac833b52f587e2ccd9de2183dfd5f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148509000) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5081) * b2951a6743c376d92371d6eaf3f9be7048a3c29c Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149072760) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5193) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt
flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt URL: https://github.com/apache/flink/pull/10995#issuecomment-581294934 ## CI report: * 65debfebc1bad89160a6375b9c78ea8eb063 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/147155607) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4764) * 1385f42a76ba0060d37022873064e736d52ea0af Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/147163976) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4766) * 9c1edd078c466ee56f2aed16ce9098e35bfae69e Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/149075460) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5194) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] buptljy commented on a change in pull request #11048: [FLINK-15966][runtime] Capture callstacks for RPC ask() calls to improve exceptions.
buptljy commented on a change in pull request #11048: [FLINK-15966][runtime] Capture callstacks for RPC ask() calls to improve exceptions. URL: https://github.com/apache/flink/pull/11048#discussion_r379743783 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaInvocationHandler.java ## @@ -370,4 +375,33 @@ public String getHostname() { public CompletableFuture getTerminationFuture() { return terminationFuture; } + + static Object deserializeValueIfNeeded(Object o, Method method) { + if (o instanceof SerializedValue) { + try { + return ((SerializedValue) o).deserializeValue(AkkaInvocationHandler.class.getClassLoader()); + } catch (IOException | ClassNotFoundException e) { + throw new CompletionException( + new RpcException( + "Could not deserialize the serialized payload of RPC method : " + method.getName(), e)); + } + } else { + return o; + } + } + + static Throwable resolveTimeoutException(Throwable exception, @Nullable Throwable callStackCapture, Method method) { + if (callStackCapture == null || (!(exception instanceof akka.pattern.AskTimeoutException))) { + return exception; + } + + final TimeoutException newException = new TimeoutException("Invocation of " + method + " timed out."); + newException.initCause(exception); + + // remove the stack frames coming from the proxy interface invocation + final StackTraceElement[] stackTrace = callStackCapture.getStackTrace(); + newException.setStackTrace(Arrays.copyOfRange(stackTrace, 3, stackTrace.length)); Review comment: Can we filter the stack trace containing o.a.f.runtime.rpc.akka instead of using 3? Because others may change call stack of this function unintentionally in the future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] buptljy commented on a change in pull request #11048: [FLINK-15966][runtime] Capture callstacks for RPC ask() calls to improve exceptions.
buptljy commented on a change in pull request #11048: [FLINK-15966][runtime] Capture callstacks for RPC ask() calls to improve exceptions. URL: https://github.com/apache/flink/pull/11048#discussion_r379740014 ## File path: flink-core/src/main/java/org/apache/flink/configuration/AkkaOptions.java ## @@ -29,6 +29,18 @@ @PublicEvolving public class AkkaOptions { + /** +* Timeout for akka ask calls. Review comment: forget to modify this :) ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] dianfu edited a comment on issue #11009: [Flink-15873] [cep] fix matches result not returned when existing earlier partial matches
dianfu edited a comment on issue #11009: [Flink-15873] [cep] fix matches result not returned when existing earlier partial matches URL: https://github.com/apache/flink/pull/11009#issuecomment-586555110 @shuai-xu Thanks for the PR. Actually I don't think this is a bug. It's by design that the matched results should only be emit when there are no partial matches which are early than the matched result. Otherwise, the after match skip strategy will be broken. Take the changed test case MatchRecognizedITCase#testAggregates as an example, actually I think the original results are correct. What's your thought? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict
flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict URL: https://github.com/apache/flink/pull/10151#issuecomment-552442468 ## CI report: * c6d7f5e864076448dca590035a6a590dc5e25c44 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/135928709) * 682da0aec5dee14c09583468d15115e2a512c827 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/140139283) * 9c66fbe1e4d81c3656eba38d56d39dfe0c065f4f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142108012) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3864) * 1b23c2232e9717218a7c61c930c481cbcf2e6f2e Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/142214794) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3888) * 063b5e87dcef4a363f01a48e4af4fb9d3670429f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142218414) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3890) * d5dfd65a163634584e8eaeee452d5454b2d4fe45 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/143576795) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4193) * d1f5b89012dc266fc0c664085c1a2aef0c8b95ec Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/143669093) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4204) * f61c52bcb1420572c2ad94e3d2f1caafbf7f6081 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/143708993) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4215) * 9ea2c427c8f7046d78c122eea8f2d0c10c200224 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144479909) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4362) * fd4f362ceed0a4785c449c807cb46dac15f712ea Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148094252) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4978) * 8e6f927d76fed13e4c095619dee0d72e6a27359f Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148125056) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4983) * 12a890c54ecc602a3d46e07e8ded4263469c Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148185789) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5005) * 1a8f90917d5fe5b928f4f70bad62a06f1ef9e3fa Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148207317) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5015) * d0f2df460885f14ad4d4f0331bde5ff2de9664ec Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148235635) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5018) * 97bdd27325be54d1ebfcdc2a4bb21b31d910d039 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148335745) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5040) * 4a56e798775f7f320a3e7b1436d28b00392caa09 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148395560) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5068) * 564913126f4ac833b52f587e2ccd9de2183dfd5f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148509000) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5081) * b2951a6743c376d92371d6eaf3f9be7048a3c29c Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149072760) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5193) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] dianfu edited a comment on issue #11009: [Flink-15873] [cep] fix matches result not returned when existing earlier partial matches
dianfu edited a comment on issue #11009: [Flink-15873] [cep] fix matches result not returned when existing earlier partial matches URL: https://github.com/apache/flink/pull/11009#issuecomment-586555110 @shuai-xu Thanks for the PR. Actually I don't think this is a bug. It's by design that the matched results should only be emit when there are no partial matches which are early than the matched result. Otherwise, the after match skip strategy will be broken. What's your thought? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] dianfu commented on issue #11009: [Flink-15873] [cep] fix matches result not returned when existing earlier partial matches
dianfu commented on issue #11009: [Flink-15873] [cep] fix matches result not returned when existing earlier partial matches URL: https://github.com/apache/flink/pull/11009#issuecomment-586555110 @shuai-xu Thanks for the PR. Actually I don't think this is a bug. It's by design that the matched results should only be emit when there are no partial matches which are early than the matched result. Otherwise, the after match skip strategy will be broken. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt
flinkbot edited a comment on issue #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt URL: https://github.com/apache/flink/pull/10995#issuecomment-581294934 ## CI report: * 65debfebc1bad89160a6375b9c78ea8eb063 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/147155607) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4764) * 1385f42a76ba0060d37022873064e736d52ea0af Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/147163976) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4766) * 9c1edd078c466ee56f2aed16ce9098e35bfae69e UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-16069) Create TaskDeploymentDescriptor in future.
huweihua created FLINK-16069: Summary: Create TaskDeploymentDescriptor in future. Key: FLINK-16069 URL: https://issues.apache.org/jira/browse/FLINK-16069 Project: Flink Issue Type: Improvement Components: Runtime / Task Reporter: huweihua The deploy of tasks will took long time when we submit a high parallelism job. And Execution#deploy run in mainThread, so it will block JobMaster process other akka messages, such as Heartbeat. The creation of TaskDeploymentDescriptor take most of time. We can put the creation in future. For example, A job [source(8000)->sink(8000)], the total 16000 tasks from SCHEDULED to DEPLOYING took more than 1mins. This caused the heartbeat of TaskManager timeout and job never success. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict
flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict URL: https://github.com/apache/flink/pull/10151#issuecomment-552442468 ## CI report: * c6d7f5e864076448dca590035a6a590dc5e25c44 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/135928709) * 682da0aec5dee14c09583468d15115e2a512c827 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/140139283) * 9c66fbe1e4d81c3656eba38d56d39dfe0c065f4f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142108012) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3864) * 1b23c2232e9717218a7c61c930c481cbcf2e6f2e Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/142214794) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3888) * 063b5e87dcef4a363f01a48e4af4fb9d3670429f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142218414) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3890) * d5dfd65a163634584e8eaeee452d5454b2d4fe45 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/143576795) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4193) * d1f5b89012dc266fc0c664085c1a2aef0c8b95ec Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/143669093) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4204) * f61c52bcb1420572c2ad94e3d2f1caafbf7f6081 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/143708993) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4215) * 9ea2c427c8f7046d78c122eea8f2d0c10c200224 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144479909) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4362) * fd4f362ceed0a4785c449c807cb46dac15f712ea Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148094252) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4978) * 8e6f927d76fed13e4c095619dee0d72e6a27359f Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148125056) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4983) * 12a890c54ecc602a3d46e07e8ded4263469c Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148185789) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5005) * 1a8f90917d5fe5b928f4f70bad62a06f1ef9e3fa Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148207317) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5015) * d0f2df460885f14ad4d4f0331bde5ff2de9664ec Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148235635) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5018) * 97bdd27325be54d1ebfcdc2a4bb21b31d910d039 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148335745) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5040) * 4a56e798775f7f320a3e7b1436d28b00392caa09 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148395560) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5068) * 564913126f4ac833b52f587e2ccd9de2183dfd5f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148509000) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5081) * b2951a6743c376d92371d6eaf3f9be7048a3c29c Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/149072760) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5193) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-16068) table with keyword-escaped columns and computed_column_expression columns
[ https://issues.apache.org/jira/browse/FLINK-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pangliang updated FLINK-16068: -- Description: I use sql-client to create a table with keyword-escaped column and computed_column_expression column, like this: {code:java} CREATE TABLE source_kafka ( log STRING, `time` BIGINT, pt as proctime() ) WITH ( 'connector.type' = 'kafka', 'connector.version' = 'universal', 'connector.topic' = 'k8s-logs', 'connector.startup-mode' = 'latest-offset', 'connector.properties.zookeeper.connect' = 'zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181/kafka', 'connector.properties.bootstrap.servers' = 'kafka.default:9092', 'connector.properties.group.id' = 'testGroup', 'format.type'='json', 'format.fail-on-missing-field' = 'true', 'update-mode' = 'append' ); {code} Then I simply used it : {code:java} SELECT * from source_kafka limit 10;{code} got an exception: {code:java} java.io.IOException: Fail to run stream sql job at org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:164) at org.apache.zeppelin.flink.FlinkStreamSqlInterpreter.callSelect(FlinkStreamSqlInterpreter.java:108) at org.apache.zeppelin.flink.FlinkSqlInterrpeter.callCommand(FlinkSqlInterrpeter.java:203) at org.apache.zeppelin.flink.FlinkSqlInterrpeter.runSqlList(FlinkSqlInterrpeter.java:151) at org.apache.zeppelin.flink.FlinkSqlInterrpeter.interpret(FlinkSqlInterrpeter.java:104) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:676) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:569) at org.apache.zeppelin.scheduler.Job.run(Job.java:172) at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121) at org.apache.zeppelin.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:39) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.table.api.SqlParserException: SQL parse failed. Encountered "time" at line 1, column 12. Was expecting one of: "ABS" ... "ARRAY" ... "AVG" ... "CARDINALITY" ... "CASE" ... "CAST" ... "CEIL" ... "CEILING" ... .. at org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:50) at org.apache.flink.table.planner.calcite.SqlExprToRexConverterImpl.convertToRexNodes(SqlExprToRexConverterImpl.java:79) at org.apache.flink.table.planner.plan.schema.CatalogSourceTable.toRel(CatalogSourceTable.scala:111) at org.apache.calcite.sql2rel.SqlToRelConverter.toRel(SqlToRelConverter.java:3328) at org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2357) at org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2051) at org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2005) at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:646) at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627) at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181) at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563) at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:148) at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:135) at org.apache.flink.table.planner.operations.SqlToOperationConverter.toQueryOperation(SqlToOperationConverter.java:522) at org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlQuery(SqlToOperationConverter.java:436) at org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:154) at org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:66) at org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:464) at org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:104) ... 13 more {code} I also did some tests, the following can run: {code:java} CREATE TABLE source_kafka ( log STRING, `a` BIGINT, pt as proctime() ) CREATE TABLE source_kafka ( log STRING,
[jira] [Created] (FLINK-16068) table with keyword-escaped columns and computed_column_expression columns
pangliang created FLINK-16068: - Summary: table with keyword-escaped columns and computed_column_expression columns Key: FLINK-16068 URL: https://issues.apache.org/jira/browse/FLINK-16068 Project: Flink Issue Type: Bug Components: Table SQL / Client Affects Versions: 1.10.0 Reporter: pangliang I use sql-client to create a table with keyword-escaped columns and computed_column_expression columns, like this: {code:java} CREATE TABLE source_kafka ( log STRING, `time` BIGINT, pt as proctime() ) WITH ( 'connector.type' = 'kafka', 'connector.version' = 'universal', 'connector.topic' = 'k8s-logs', 'connector.startup-mode' = 'latest-offset', 'connector.properties.zookeeper.connect' = 'zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181/kafka', 'connector.properties.bootstrap.servers' = 'kafka.default:9092', 'connector.properties.group.id' = 'testGroup', 'format.type'='json', 'format.fail-on-missing-field' = 'true', 'update-mode' = 'append' ); {code} Then I simply used it : {code:java} SELECT * from source_kafka limit 10;{code} got an exception: {code:java} java.io.IOException: Fail to run stream sql job at org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:164) at org.apache.zeppelin.flink.FlinkStreamSqlInterpreter.callSelect(FlinkStreamSqlInterpreter.java:108) at org.apache.zeppelin.flink.FlinkSqlInterrpeter.callCommand(FlinkSqlInterrpeter.java:203) at org.apache.zeppelin.flink.FlinkSqlInterrpeter.runSqlList(FlinkSqlInterrpeter.java:151) at org.apache.zeppelin.flink.FlinkSqlInterrpeter.interpret(FlinkSqlInterrpeter.java:104) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:676) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:569) at org.apache.zeppelin.scheduler.Job.run(Job.java:172) at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121) at org.apache.zeppelin.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:39) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.table.api.SqlParserException: SQL parse failed. Encountered "time" at line 1, column 12. Was expecting one of: "ABS" ... "ARRAY" ... "AVG" ... "CARDINALITY" ... "CASE" ... "CAST" ... "CEIL" ... "CEILING" ... .. at org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:50) at org.apache.flink.table.planner.calcite.SqlExprToRexConverterImpl.convertToRexNodes(SqlExprToRexConverterImpl.java:79) at org.apache.flink.table.planner.plan.schema.CatalogSourceTable.toRel(CatalogSourceTable.scala:111) at org.apache.calcite.sql2rel.SqlToRelConverter.toRel(SqlToRelConverter.java:3328) at org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2357) at org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2051) at org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2005) at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:646) at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627) at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181) at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563) at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:148) at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:135) at org.apache.flink.table.planner.operations.SqlToOperationConverter.toQueryOperation(SqlToOperationConverter.java:522) at org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlQuery(SqlToOperationConverter.java:436) at org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:154) at org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:66) at org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:464) at org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:104)
[GitHub] [flink] flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict
flinkbot edited a comment on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict URL: https://github.com/apache/flink/pull/10151#issuecomment-552442468 ## CI report: * c6d7f5e864076448dca590035a6a590dc5e25c44 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/135928709) * 682da0aec5dee14c09583468d15115e2a512c827 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/140139283) * 9c66fbe1e4d81c3656eba38d56d39dfe0c065f4f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142108012) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3864) * 1b23c2232e9717218a7c61c930c481cbcf2e6f2e Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/142214794) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3888) * 063b5e87dcef4a363f01a48e4af4fb9d3670429f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142218414) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3890) * d5dfd65a163634584e8eaeee452d5454b2d4fe45 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/143576795) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4193) * d1f5b89012dc266fc0c664085c1a2aef0c8b95ec Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/143669093) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4204) * f61c52bcb1420572c2ad94e3d2f1caafbf7f6081 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/143708993) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4215) * 9ea2c427c8f7046d78c122eea8f2d0c10c200224 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144479909) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4362) * fd4f362ceed0a4785c449c807cb46dac15f712ea Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148094252) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4978) * 8e6f927d76fed13e4c095619dee0d72e6a27359f Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148125056) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4983) * 12a890c54ecc602a3d46e07e8ded4263469c Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148185789) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5005) * 1a8f90917d5fe5b928f4f70bad62a06f1ef9e3fa Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148207317) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5015) * d0f2df460885f14ad4d4f0331bde5ff2de9664ec Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148235635) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5018) * 97bdd27325be54d1ebfcdc2a4bb21b31d910d039 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148335745) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5040) * 4a56e798775f7f320a3e7b1436d28b00392caa09 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148395560) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5068) * 564913126f4ac833b52f587e2ccd9de2183dfd5f Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148509000) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5081) * b2951a6743c376d92371d6eaf3f9be7048a3c29c UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt
hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt URL: https://github.com/apache/flink/pull/10995#discussion_r379713878 ## File path: flink-ml-parent/flink-ml-lib/pom.xml ## @@ -57,4 +57,30 @@ under the License. 1.1.2 + + + + + + org.apache.maven.plugins + maven-shade-plugin + + + shade-flink + package + + shade + + + + + com.github.fommil.netlib:core Review comment: @zentol The license has already been added by this commit https://github.com/apache/flink/pull/8631/files Or do you mean we should remove the licensing for release-1.10 and release-1.9? I think we can create a fix in another PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt
hequn8128 commented on a change in pull request #10995: [FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt URL: https://github.com/apache/flink/pull/10995#discussion_r379713304 ## File path: flink-dist/src/main/assemblies/opt.xml ## @@ -168,6 +168,14 @@ flink-python_${scala.binary.version}-${project.version}.jar 0644 + + + Review comment: Hi @dianfu @zentol Thanks for your advice. I would fine with adding an uber module. But I think there are some differences between the table module and the ml module. For table module, it needs a uber module to packages all table sub-modules plus other modules, e.g., cep. While for ml, there are no such requirements. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunhaibotb commented on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict
sunhaibotb commented on issue #10151: [FLINK-14231] Handle the pending processing-time timers to make endInput semantics on the operator chain strict URL: https://github.com/apache/flink/pull/10151#issuecomment-586546096 Latest update to resolve conflicts with master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests
flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012 ## CI report: * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181) * f65641913b8e78475a174d4c45fcf6720324e894 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191) * 34ac6e03b86281ba56a4108f0f713f7a769b1291 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149041446) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5192) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests
flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012 ## CI report: * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181) * f65641913b8e78475a174d4c45fcf6720324e894 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191) * 34ac6e03b86281ba56a4108f0f713f7a769b1291 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149041446) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5192) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on issue #11052: FLINK-15975 Use LinkedHashMap for deterministic iterations
bowenli86 commented on issue #11052: FLINK-15975 Use LinkedHashMap for deterministic iterations URL: https://github.com/apache/flink/pull/11052#issuecomment-586484297 can you rename the ticket and this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests
flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012 ## CI report: * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181) * f65641913b8e78475a174d4c45fcf6720324e894 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191) * 34ac6e03b86281ba56a4108f0f713f7a769b1291 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/149041446) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5192) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests
flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012 ## CI report: * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181) * f65641913b8e78475a174d4c45fcf6720324e894 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191) * 34ac6e03b86281ba56a4108f0f713f7a769b1291 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests
flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012 ## CI report: * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181) * f65641913b8e78475a174d4c45fcf6720324e894 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379569902 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379575432 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379572141 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379560406 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing Review comment: As far as I remember, this was supposed to be about the conceptual differences between state handling in batch and stream processing. I don't remember exactly, what we talked about here. The only thing I can think of right now is, that the data structures in which state is stored need to be different in batch and stream processing for it to be efficient. I guess one could also say something about boundedness and expiry of state? @StephanEwen what did we have in mind here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379554124 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. Review comment: is the sequence of events the state or does the state store the sequence of events? I would say the former. Same for the examples below. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379575236 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379587916 ## File path: docs/dev/stream/state/state.md ## @@ -22,66 +22,17 @@ specific language governing permissions and limitations under the License. --> -This document explains how to use Flink's state abstractions when developing an application. +In this section you will learn about the stateful abstractions that Flink Review comment: see above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379586538 ## File path: docs/concepts/timely-stream-processing.md ## @@ -0,0 +1,237 @@ +--- +title: Timely Stream Processing +nav-id: timely-stream-processing +nav-pos: 3 +nav-title: Timely Stream Processing +nav-parent_id: concepts +--- + + +`TODO: add introduction` + +* This will be replaced by the TOC +{:toc} + +## Latency & Completeness + +`TODO: add these two sections` + +### Latency vs. Completeness in Batch & Stream Processing + +{% top %} + +## Event Time, Processing Time, and Ingestion Time + +When referring to time in a streaming program (for example to define windows), +one can refer to different notions of *time*: + +- **Processing time:** Processing time refers to the system time of the machine + that is executing the respective operation. + + When a streaming program runs on processing time, all time-based operations + (like time windows) will use the system clock of the machines that run the + respective operator. An hourly processing time window will include all + records that arrived at a specific operator between the times when the system + clock indicated the full hour. For example, if an application begins running + at 9:15am, the first hourly processing time window will include events + processed between 9:15am and 10:00am, the next window will include events + processed between 10:00am and 11:00am, and so on. + + Processing time is the simplest notion of time and requires no coordination + between streams and machines. It provides the best performance and the + lowest latency. However, in distributed and asynchronous environments + processing time does not provide determinism, because it is susceptible to + the speed at which records arrive in the system (for example from the message + queue), to the speed at which the records flow between operators inside the + system, and to outages (scheduled, or otherwise). + +- **Event time:** Event time is the time that each individual event occurred on + its producing device. This time is typically embedded within the records + before they enter Flink, and that *event timestamp* can be extracted from + each record. In event time, the progress of time depends on the data, not on + any wall clocks. Event time programs must specify how to generate *Event Time + Watermarks*, which is the mechanism that signals progress in event time. This + watermarking mechanism is described in a later section, + [below](#event-time-and-watermarks). + + In a perfect world, event time processing would yield completely consistent + and deterministic results, regardless of when events arrive, or their + ordering. However, unless the events are known to arrive in-order (by + timestamp), event time processing incurs some latency while waiting for + out-of-order events. As it is only possible to wait for a finite period of + time, this places a limit on how deterministic event time applications can + be. + + Assuming all of the data has arrived, event time operations will behave as + expected, and produce correct and consistent results even when working with + out-of-order or late events, or when reprocessing historic data. For example, + an hourly event time window will contain all records that carry an event + timestamp that falls into that hour, regardless of the order in which they + arrive, or when they are processed. (See the section on [late + events](#late-elements) for more information.) + + + + Note that sometimes when event time programs are processing live data in + real-time, they will use some *processing time* operations in order to + guarantee that they are progressing in a timely fashion. + +- **Ingestion time:** Ingestion time is the time that events enter Flink. At + the source operator each record gets the source's current time as a + timestamp, and time-based operations (like time windows) refer to that + timestamp. + + *Ingestion time* sits conceptually in between *event time* and *processing + time*. Compared to *processing time*, it is slightly more expensive, but + gives more predictable results. Because *ingestion time* uses stable + timestamps (assigned once at the source), different window operations over + the records will refer to the same timestamp, whereas in *processing time* + each window operator may assign the record to a different window (based on + the local system clock and any transport delay). + + Compared to *event time*, *ingestion time* programs cannot handle any + out-of-order events or late data, but the programs don't have to specify how + to generate *watermarks*. + + Internally, *ingestion time* is treated much like *event time*, but with + automatic timestamp assignment and automatic watermark generation. + + + +{% top %} +
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379562658 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + Review comment: I think, this introduction should already mention the concept and importance of state locality. Maybe with the typical figure of two-tiered architecture to state and logic fused into one thing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379554299 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant Review comment: ```suggestion Flink needs to be aware of the state in order to it fault tolerant ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379583881 ## File path: docs/concepts/flink-architecture.md ## @@ -0,0 +1,140 @@ +--- +title: Flink Architecture +nav-id: flink-architecture +nav-pos: 4 +nav-title: Flink Architecture +nav-parent_id: concepts +--- + + +* This will be replaced by the TOC +{:toc} + +## Flink Applications and Flink Sessions + +`TODO: expand this section` + +{% top %} + +## Anatomy of a Flink Cluster + +`TODO: expand this section, especially about components of the Flink Master and +container environments` + +The Flink runtime consists of two types of processes: + + - The *Flink Master* coordinates the distributed execution. It schedules +tasks, coordinates checkpoints, coordinates recovery on failures, etc. + +There is always at least one *Flink Master*. A high-availability setup will +have multiple *Flink Masters*, one of which one is always the *leader*, and +the others are *standby*. + + - The *TaskManagers* (also called *workers*) execute the *tasks* (or more +specifically, the subtasks) of a dataflow, and buffer and exchange the data +*streams*. + +There must always be at least one TaskManager. + +The Flink Master and TaskManagers can be started in various ways: directly on +the machines as a [standalone cluster]({{ site.baseurl }}{% link +ops/deployment/cluster_setup.md %}), in containers, or managed by resource +frameworks like [YARN]({{ site.baseurl }}{% link ops/deployment/yarn_setup.md +%}) or [Mesos]({{ site.baseurl }}{% link ops/deployment/mesos.md %}). +TaskManagers connect to Flink Masters, announcing themselves as available, and +are assigned work. + +The *client* is not part of the runtime and program execution, but is used to +prepare and send a dataflow to the Flink Master. After that, the client can +disconnect, or stay connected to receive progress reports. The client runs +either as part of the Java/Scala program that triggers the execution, or in the +command line process `./bin/flink run ...`. + + + +{% top %} + +## Tasks and Operator Chains + +For distributed execution, Flink *chains* operator subtasks together into +*tasks*. Each task is executed by one thread. Chaining operators together into +tasks is a useful optimization: it reduces the overhead of thread-to-thread +handover and buffering, and increases overall throughput while decreasing +latency. The chaining behavior can be configured; see the [chaining docs]({{ +site.baseurl }}{% link dev/stream/operators/index.md +%}#task-chaining-and-resource-groups) for details. + +The sample dataflow in the figure below is executed with five subtasks, and +hence with five parallel threads. + + + +{% top %} + +## Task Slots and Resources + +Each worker (TaskManager) is a *JVM process*, and may execute one or more +subtasks in separate threads. To control how many tasks a worker accepts, a +worker has so called **task slots** (at least one). + +Each *task slot* represents a fixed subset of resources of the TaskManager. A Review comment: There is already quite some operational content mixed in here. Could be shorter in the Concepts section in my opinion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379547482 ## File path: docs/concepts/stream-processing.md ## @@ -0,0 +1,96 @@ +--- +title: Stream Processing +nav-id: stream-processing +nav-pos: 1 +nav-title: Stream Processing +nav-parent_id: concepts +--- + + +`TODO: Add introduction` +* This will be replaced by the TOC +{:toc} + +## A Unified System for Batch & Stream Processing + +`TODO` + +{% top %} + +## Programs and Dataflows + +The basic building blocks of Flink programs are **streams** and +**transformations**. Conceptually a *stream* is a (potentially never-ending) +flow of data records, and a *transformation* is an operation that takes one or +more streams as input, and produces one or more output streams as a result. + +When executed, Flink programs are mapped to **streaming dataflows**, consisting +of **streams** and transformation **operators**. Each dataflow starts with one +or more **sources** and ends in one or more **sinks**. The dataflows resemble +arbitrary **directed acyclic graphs** *(DAGs)*. Although special forms of +cycles are permitted via *iteration* constructs, for the most part we will +gloss over this for simplicity. + + + +Often there is a one-to-one correspondence between the transformations in the +programs and the operators in the dataflow. Sometimes, however, one +transformation may consist of multiple transformation operators. + +{% top %} + +## Parallel Dataflows + +Programs in Flink are inherently parallel and distributed. During execution, a +*stream* has one or more **stream partitions**, and each *operator* has one or +more **operator subtasks**. The operator subtasks are independent of one +another, and execute in different threads and possibly on different machines or +containers. + +The number of operator subtasks is the **parallelism** of that particular +operator. The parallelism of a stream is always that of its producing operator. +Different operators of the same program may have different levels of +parallelism. + + Review comment: I would update this figure. Top: Logical (Data Flow) Graph Bottom: Physical Graph In the bottom graph, all subtasks for an operator together are *not* an operator. It looks like it in the figure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379584764 ## File path: docs/concepts/flink-architecture.md ## @@ -0,0 +1,140 @@ +--- +title: Flink Architecture +nav-id: flink-architecture +nav-pos: 4 +nav-title: Flink Architecture +nav-parent_id: concepts +--- + + +* This will be replaced by the TOC +{:toc} + +## Flink Applications and Flink Sessions + +`TODO: expand this section` + +{% top %} + +## Anatomy of a Flink Cluster + +`TODO: expand this section, especially about components of the Flink Master and +container environments` + +The Flink runtime consists of two types of processes: + + - The *Flink Master* coordinates the distributed execution. It schedules +tasks, coordinates checkpoints, coordinates recovery on failures, etc. + +There is always at least one *Flink Master*. A high-availability setup will +have multiple *Flink Masters*, one of which one is always the *leader*, and +the others are *standby*. + + - The *TaskManagers* (also called *workers*) execute the *tasks* (or more +specifically, the subtasks) of a dataflow, and buffer and exchange the data +*streams*. + +There must always be at least one TaskManager. + +The Flink Master and TaskManagers can be started in various ways: directly on +the machines as a [standalone cluster]({{ site.baseurl }}{% link +ops/deployment/cluster_setup.md %}), in containers, or managed by resource +frameworks like [YARN]({{ site.baseurl }}{% link ops/deployment/yarn_setup.md +%}) or [Mesos]({{ site.baseurl }}{% link ops/deployment/mesos.md %}). +TaskManagers connect to Flink Masters, announcing themselves as available, and +are assigned work. + +The *client* is not part of the runtime and program execution, but is used to +prepare and send a dataflow to the Flink Master. After that, the client can +disconnect, or stay connected to receive progress reports. The client runs +either as part of the Java/Scala program that triggers the execution, or in the +command line process `./bin/flink run ...`. + + + +{% top %} + +## Tasks and Operator Chains + +For distributed execution, Flink *chains* operator subtasks together into +*tasks*. Each task is executed by one thread. Chaining operators together into +tasks is a useful optimization: it reduces the overhead of thread-to-thread +handover and buffering, and increases overall throughput while decreasing +latency. The chaining behavior can be configured; see the [chaining docs]({{ +site.baseurl }}{% link dev/stream/operators/index.md +%}#task-chaining-and-resource-groups) for details. + +The sample dataflow in the figure below is executed with five subtasks, and +hence with five parallel threads. + + + +{% top %} + +## Task Slots and Resources + +Each worker (TaskManager) is a *JVM process*, and may execute one or more +subtasks in separate threads. To control how many tasks a worker accepts, a +worker has so called **task slots** (at least one). + +Each *task slot* represents a fixed subset of resources of the TaskManager. A +TaskManager with three slots, for example, will dedicate 1/3 of its managed +memory to each slot. Slotting the resources means that a subtask will not +compete with subtasks from other jobs for managed memory, but instead has a +certain amount of reserved managed memory. Note that no CPU isolation happens +here; currently slots only separate the managed memory of tasks. + +By adjusting the number of task slots, users can define how subtasks are +isolated from each other. Having one slot per TaskManager means each task +group runs in a separate JVM (which can be started in a separate container, for +example). Having multiple slots means more subtasks share the same JVM. Tasks +in the same JVM share TCP connections (via multiplexing) and heartbeat +messages. They may also share data sets and data structures, thus reducing the +per-task overhead. + + + +By default, Flink allows subtasks to share slots even if they are subtasks of +different tasks, so long as they are from the same job. The result is that one +slot may hold an entire pipeline of the job. Allowing this *slot sharing* has +two main benefits: + + - A Flink cluster needs exactly as many task slots as the highest parallelism +used in the job. No need to calculate how many tasks (with varying +parallelism) a program contains in total. + + - It is easier to get better resource utilization. Without slot sharing, the +non-intensive *source/map()* subtasks would block as many resources as the +resource intensive *window* subtasks. With slot sharing, increasing the +base parallelism in our example from two to six yields full utilization of +the slotted resources, while making sure that the heavy subtasks are fairly +distributed among the TaskManagers. + + + +The APIs also include a
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379565697 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are Review comment: I would not consider KeyGroups a conceptual topics, but rather an operational or internal. I might miss something though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379583328 ## File path: docs/concepts/flink-architecture.md ## @@ -0,0 +1,140 @@ +--- +title: Flink Architecture +nav-id: flink-architecture +nav-pos: 4 +nav-title: Flink Architecture +nav-parent_id: concepts +--- + + +* This will be replaced by the TOC +{:toc} + +## Flink Applications and Flink Sessions + +`TODO: expand this section` + +{% top %} + +## Anatomy of a Flink Cluster + +`TODO: expand this section, especially about components of the Flink Master and +container environments` + +The Flink runtime consists of two types of processes: + + - The *Flink Master* coordinates the distributed execution. It schedules +tasks, coordinates checkpoints, coordinates recovery on failures, etc. + +There is always at least one *Flink Master*. A high-availability setup will +have multiple *Flink Masters*, one of which one is always the *leader*, and +the others are *standby*. + + - The *TaskManagers* (also called *workers*) execute the *tasks* (or more +specifically, the subtasks) of a dataflow, and buffer and exchange the data +*streams*. + +There must always be at least one TaskManager. + +The Flink Master and TaskManagers can be started in various ways: directly on +the machines as a [standalone cluster]({{ site.baseurl }}{% link +ops/deployment/cluster_setup.md %}), in containers, or managed by resource +frameworks like [YARN]({{ site.baseurl }}{% link ops/deployment/yarn_setup.md +%}) or [Mesos]({{ site.baseurl }}{% link ops/deployment/mesos.md %}). +TaskManagers connect to Flink Masters, announcing themselves as available, and +are assigned work. + +The *client* is not part of the runtime and program execution, but is used to +prepare and send a dataflow to the Flink Master. After that, the client can +disconnect, or stay connected to receive progress reports. The client runs +either as part of the Java/Scala program that triggers the execution, or in the +command line process `./bin/flink run ...`. + + + +{% top %} + +## Tasks and Operator Chains + +For distributed execution, Flink *chains* operator subtasks together into +*tasks*. Each task is executed by one thread. Chaining operators together into +tasks is a useful optimization: it reduces the overhead of thread-to-thread +handover and buffering, and increases overall throughput while decreasing +latency. The chaining behavior can be configured; see the [chaining docs]({{ +site.baseurl }}{% link dev/stream/operators/index.md +%}#task-chaining-and-resource-groups) for details. + +The sample dataflow in the figure below is executed with five subtasks, and +hence with five parallel threads. + + Review comment: Top: Logical Graph, there are no tasks in the logical graph Bottom: Physical Graph This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379567637 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379573424 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379576013 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379569436 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379571150 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379587809 ## File path: docs/dev/stream/state/index.md ## @@ -25,23 +25,10 @@ specific language governing permissions and limitations under the License. --> -Stateful functions and operators store data across the processing of individual elements/events, making state a critical building block for -any type of more elaborate operation. - -For example: - - - When an application searches for certain event patterns, the state will store the sequence of events encountered so far. - - When aggregating events per minute/hour/day, the state holds the pending aggregates. - - When training a machine learning model over a stream of data points, the state holds the current version of the model parameters. - - When historic data needs to be managed, the state allows efficient access to events that occurred in the past. - -Flink needs to be aware of the state in order to make state fault tolerant using [checkpoints](checkpointing.html) and to allow [savepoints]({{ site.baseurl }}/ops/state/savepoints.html) of streaming applications. - -Knowledge about the state also allows for rescaling Flink applications, meaning that Flink takes care of redistributing state across parallel instances. - -The [queryable state](queryable_state.html) feature of Flink allows you to access state from outside of Flink during runtime. - -When working with state, it might also be useful to read about [Flink's state backends]({{ site.baseurl }}/ops/state/state_backends.html). Flink provides different state backends that specify how and where state is stored. State can be located on Java's heap or off-heap. Depending on your state backend, Flink can also *manage* the state for the application, meaning Flink deals with the memory management (possibly spilling to disk if necessary) to allow applications to hold very large state. State backends can be configured without changing your application logic. +In this section you will learn about the stateful abstractions that Flink Review comment: are abstractions stateful? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379566326 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different Review comment: ```suggestion operator instances when the parallelism is changed. There are different ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379546260 ## File path: docs/concepts/stream-processing.md ## @@ -0,0 +1,96 @@ +--- +title: Stream Processing +nav-id: stream-processing +nav-pos: 1 +nav-title: Stream Processing +nav-parent_id: concepts +--- + + +`TODO: Add introduction` +* This will be replaced by the TOC +{:toc} + +## A Unified System for Batch & Stream Processing + +`TODO` + +{% top %} + +## Programs and Dataflows + +The basic building blocks of Flink programs are **streams** and +**transformations**. Conceptually a *stream* is a (potentially never-ending) +flow of data records, and a *transformation* is an operation that takes one or +more streams as input, and produces one or more output streams as a result. + +When executed, Flink programs are mapped to **streaming dataflows**, consisting +of **streams** and transformation **operators**. Each dataflow starts with one +or more **sources** and ends in one or more **sinks**. The dataflows resemble +arbitrary **directed acyclic graphs** *(DAGs)*. Although special forms of +cycles are permitted via *iteration* constructs, for the most part we will +gloss over this for simplicity. + + + +Often there is a one-to-one correspondence between the transformations in the +programs and the operators in the dataflow. Sometimes, however, one +transformation may consist of multiple transformation operators. Review comment: ```suggestion transformation may consist of multiple operators. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379568312 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379582165 ## File path: docs/concepts/flink-architecture.md ## @@ -0,0 +1,140 @@ +--- +title: Flink Architecture +nav-id: flink-architecture +nav-pos: 4 +nav-title: Flink Architecture +nav-parent_id: concepts +--- + + +* This will be replaced by the TOC +{:toc} + +## Flink Applications and Flink Sessions + +`TODO: expand this section` + +{% top %} + +## Anatomy of a Flink Cluster + +`TODO: expand this section, especially about components of the Flink Master and +container environments` + +The Flink runtime consists of two types of processes: + + - The *Flink Master* coordinates the distributed execution. It schedules +tasks, coordinates checkpoints, coordinates recovery on failures, etc. + +There is always at least one *Flink Master*. A high-availability setup will Review comment: ```suggestion There is always at least one *Flink Master*. A high-availability setup might ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379556110 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link Review comment: Queryable State allows your to... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379570309 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379576154 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379581742 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379546166 ## File path: docs/concepts/stream-processing.md ## @@ -0,0 +1,96 @@ +--- +title: Stream Processing +nav-id: stream-processing +nav-pos: 1 +nav-title: Stream Processing +nav-parent_id: concepts +--- + + +`TODO: Add introduction` +* This will be replaced by the TOC +{:toc} + +## A Unified System for Batch & Stream Processing + +`TODO` + +{% top %} + +## Programs and Dataflows + +The basic building blocks of Flink programs are **streams** and +**transformations**. Conceptually a *stream* is a (potentially never-ending) +flow of data records, and a *transformation* is an operation that takes one or +more streams as input, and produces one or more output streams as a result. + +When executed, Flink programs are mapped to **streaming dataflows**, consisting +of **streams** and transformation **operators**. Each dataflow starts with one +or more **sources** and ends in one or more **sinks**. The dataflows resemble +arbitrary **directed acyclic graphs** *(DAGs)*. Although special forms of +cycles are permitted via *iteration* constructs, for the most part we will +gloss over this for simplicity. + + + +Often there is a one-to-one correspondence between the transformations in the +programs and the operators in the dataflow. Sometimes, however, one Review comment: ```suggestion programs and the operators in the logical dataflow graph. Sometimes, however, one ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379552116 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a Review comment: Operations vs Operator vs Function vs Transformation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379571692 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379550966 ## File path: docs/concepts/stream-processing.md ## @@ -0,0 +1,96 @@ +--- +title: Stream Processing +nav-id: stream-processing +nav-pos: 1 +nav-title: Stream Processing +nav-parent_id: concepts +--- + + +`TODO: Add introduction` +* This will be replaced by the TOC +{:toc} + +## A Unified System for Batch & Stream Processing + +`TODO` + +{% top %} + +## Programs and Dataflows + +The basic building blocks of Flink programs are **streams** and +**transformations**. Conceptually a *stream* is a (potentially never-ending) +flow of data records, and a *transformation* is an operation that takes one or +more streams as input, and produces one or more output streams as a result. + +When executed, Flink programs are mapped to **streaming dataflows**, consisting +of **streams** and transformation **operators**. Each dataflow starts with one +or more **sources** and ends in one or more **sinks**. The dataflows resemble +arbitrary **directed acyclic graphs** *(DAGs)*. Although special forms of +cycles are permitted via *iteration* constructs, for the most part we will +gloss over this for simplicity. + + + +Often there is a one-to-one correspondence between the transformations in the +programs and the operators in the dataflow. Sometimes, however, one +transformation may consist of multiple transformation operators. + +{% top %} + +## Parallel Dataflows + +Programs in Flink are inherently parallel and distributed. During execution, a +*stream* has one or more **stream partitions**, and each *operator* has one or +more **operator subtasks**. The operator subtasks are independent of one +another, and execute in different threads and possibly on different machines or +containers. + +The number of operator subtasks is the **parallelism** of that particular +operator. The parallelism of a stream is always that of its producing operator. +Different operators of the same program may have different levels of +parallelism. + + + +Streams can transport data between two operators in a *one-to-one* (or Review comment: I think, different redistribution patterns that Fabian it in his book is more to the point. I think it was: * Forward * Broadcast * Random * Keyed IMHO the additional classification in "Redistributing" and "One-to-one" does not help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379545948 ## File path: docs/concepts/stream-processing.md ## @@ -0,0 +1,96 @@ +--- +title: Stream Processing +nav-id: stream-processing +nav-pos: 1 +nav-title: Stream Processing +nav-parent_id: concepts +--- + + +`TODO: Add introduction` +* This will be replaced by the TOC +{:toc} + +## A Unified System for Batch & Stream Processing + +`TODO` + +{% top %} + +## Programs and Dataflows + +The basic building blocks of Flink programs are **streams** and +**transformations**. Conceptually a *stream* is a (potentially never-ending) +flow of data records, and a *transformation* is an operation that takes one or +more streams as input, and produces one or more output streams as a result. + +When executed, Flink programs are mapped to **streaming dataflows**, consisting +of **streams** and transformation **operators**. Each dataflow starts with one +or more **sources** and ends in one or more **sinks**. The dataflows resemble +arbitrary **directed acyclic graphs** *(DAGs)*. Although special forms of +cycles are permitted via *iteration* constructs, for the most part we will +gloss over this for simplicity. + + Review comment: In the glossary we call the "streaming dataflow" "logical graph" or "jobgraph". Might want to add this here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379566803 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be Review comment: ```suggestion support use cases where records
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379576359 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379566028 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link Review comment: parallel operator instance => sub-task? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379574304 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379582452 ## File path: docs/concepts/flink-architecture.md ## @@ -0,0 +1,140 @@ +--- +title: Flink Architecture +nav-id: flink-architecture +nav-pos: 4 +nav-title: Flink Architecture +nav-parent_id: concepts +--- + + +* This will be replaced by the TOC +{:toc} + +## Flink Applications and Flink Sessions + +`TODO: expand this section` + +{% top %} + +## Anatomy of a Flink Cluster + +`TODO: expand this section, especially about components of the Flink Master and +container environments` + +The Flink runtime consists of two types of processes: + + - The *Flink Master* coordinates the distributed execution. It schedules +tasks, coordinates checkpoints, coordinates recovery on failures, etc. + +There is always at least one *Flink Master*. A high-availability setup will +have multiple *Flink Masters*, one of which one is always the *leader*, and +the others are *standby*. + + - The *TaskManagers* (also called *workers*) execute the *tasks* (or more +specifically, the subtasks) of a dataflow, and buffer and exchange the data Review comment: according to the glossary "(or more specifically, the subtasks)" does not make sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379551776 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are Review comment: ```suggestion across multiple events (for example window operators). These operations are ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379587424 ## File path: docs/concepts/timely-stream-processing.md ## @@ -0,0 +1,237 @@ +--- +title: Timely Stream Processing +nav-id: timely-stream-processing +nav-pos: 3 +nav-title: Timely Stream Processing +nav-parent_id: concepts +--- + + +`TODO: add introduction` + +* This will be replaced by the TOC +{:toc} + +## Latency & Completeness + +`TODO: add these two sections` + +### Latency vs. Completeness in Batch & Stream Processing + +{% top %} + +## Event Time, Processing Time, and Ingestion Time + +When referring to time in a streaming program (for example to define windows), +one can refer to different notions of *time*: + +- **Processing time:** Processing time refers to the system time of the machine + that is executing the respective operation. + + When a streaming program runs on processing time, all time-based operations + (like time windows) will use the system clock of the machines that run the + respective operator. An hourly processing time window will include all + records that arrived at a specific operator between the times when the system + clock indicated the full hour. For example, if an application begins running + at 9:15am, the first hourly processing time window will include events + processed between 9:15am and 10:00am, the next window will include events + processed between 10:00am and 11:00am, and so on. + + Processing time is the simplest notion of time and requires no coordination + between streams and machines. It provides the best performance and the + lowest latency. However, in distributed and asynchronous environments + processing time does not provide determinism, because it is susceptible to + the speed at which records arrive in the system (for example from the message + queue), to the speed at which the records flow between operators inside the + system, and to outages (scheduled, or otherwise). + +- **Event time:** Event time is the time that each individual event occurred on + its producing device. This time is typically embedded within the records + before they enter Flink, and that *event timestamp* can be extracted from + each record. In event time, the progress of time depends on the data, not on + any wall clocks. Event time programs must specify how to generate *Event Time + Watermarks*, which is the mechanism that signals progress in event time. This + watermarking mechanism is described in a later section, + [below](#event-time-and-watermarks). + + In a perfect world, event time processing would yield completely consistent + and deterministic results, regardless of when events arrive, or their + ordering. However, unless the events are known to arrive in-order (by + timestamp), event time processing incurs some latency while waiting for + out-of-order events. As it is only possible to wait for a finite period of + time, this places a limit on how deterministic event time applications can + be. + + Assuming all of the data has arrived, event time operations will behave as + expected, and produce correct and consistent results even when working with + out-of-order or late events, or when reprocessing historic data. For example, + an hourly event time window will contain all records that carry an event + timestamp that falls into that hour, regardless of the order in which they + arrive, or when they are processed. (See the section on [late + events](#late-elements) for more information.) + + + + Note that sometimes when event time programs are processing live data in + real-time, they will use some *processing time* operations in order to + guarantee that they are progressing in a timely fashion. + +- **Ingestion time:** Ingestion time is the time that events enter Flink. At + the source operator each record gets the source's current time as a + timestamp, and time-based operations (like time windows) refer to that + timestamp. + + *Ingestion time* sits conceptually in between *event time* and *processing + time*. Compared to *processing time*, it is slightly more expensive, but + gives more predictable results. Because *ingestion time* uses stable + timestamps (assigned once at the source), different window operations over + the records will refer to the same timestamp, whereas in *processing time* + each window operator may assign the record to a different window (based on + the local system clock and any transport delay). + + Compared to *event time*, *ingestion time* programs cannot handle any + out-of-order events or late data, but the programs don't have to specify how + to generate *watermarks*. + + Internally, *ingestion time* is treated much like *event time*, but with + automatic timestamp assignment and automatic watermark generation. + + + +{% top %} +
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379571869 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379584150 ## File path: docs/concepts/flink-architecture.md ## @@ -0,0 +1,140 @@ +--- +title: Flink Architecture +nav-id: flink-architecture +nav-pos: 4 +nav-title: Flink Architecture +nav-parent_id: concepts +--- + + +* This will be replaced by the TOC +{:toc} + +## Flink Applications and Flink Sessions + +`TODO: expand this section` + +{% top %} + +## Anatomy of a Flink Cluster + +`TODO: expand this section, especially about components of the Flink Master and +container environments` + +The Flink runtime consists of two types of processes: + + - The *Flink Master* coordinates the distributed execution. It schedules +tasks, coordinates checkpoints, coordinates recovery on failures, etc. + +There is always at least one *Flink Master*. A high-availability setup will +have multiple *Flink Masters*, one of which one is always the *leader*, and +the others are *standby*. + + - The *TaskManagers* (also called *workers*) execute the *tasks* (or more +specifically, the subtasks) of a dataflow, and buffer and exchange the data +*streams*. + +There must always be at least one TaskManager. + +The Flink Master and TaskManagers can be started in various ways: directly on +the machines as a [standalone cluster]({{ site.baseurl }}{% link +ops/deployment/cluster_setup.md %}), in containers, or managed by resource +frameworks like [YARN]({{ site.baseurl }}{% link ops/deployment/yarn_setup.md +%}) or [Mesos]({{ site.baseurl }}{% link ops/deployment/mesos.md %}). +TaskManagers connect to Flink Masters, announcing themselves as available, and +are assigned work. + +The *client* is not part of the runtime and program execution, but is used to +prepare and send a dataflow to the Flink Master. After that, the client can +disconnect, or stay connected to receive progress reports. The client runs +either as part of the Java/Scala program that triggers the execution, or in the +command line process `./bin/flink run ...`. + + + +{% top %} + +## Tasks and Operator Chains + +For distributed execution, Flink *chains* operator subtasks together into +*tasks*. Each task is executed by one thread. Chaining operators together into +tasks is a useful optimization: it reduces the overhead of thread-to-thread +handover and buffering, and increases overall throughput while decreasing +latency. The chaining behavior can be configured; see the [chaining docs]({{ +site.baseurl }}{% link dev/stream/operators/index.md +%}#task-chaining-and-resource-groups) for details. + +The sample dataflow in the figure below is executed with five subtasks, and +hence with five parallel threads. + + + +{% top %} + +## Task Slots and Resources + +Each worker (TaskManager) is a *JVM process*, and may execute one or more +subtasks in separate threads. To control how many tasks a worker accepts, a +worker has so called **task slots** (at least one). + +Each *task slot* represents a fixed subset of resources of the TaskManager. A +TaskManager with three slots, for example, will dedicate 1/3 of its managed +memory to each slot. Slotting the resources means that a subtask will not +compete with subtasks from other jobs for managed memory, but instead has a +certain amount of reserved managed memory. Note that no CPU isolation happens +here; currently slots only separate the managed memory of tasks. + +By adjusting the number of task slots, users can define how subtasks are +isolated from each other. Having one slot per TaskManager means each task +group runs in a separate JVM (which can be started in a separate container, for +example). Having multiple slots means more subtasks share the same JVM. Tasks +in the same JVM share TCP connections (via multiplexing) and heartbeat +messages. They may also share data sets and data structures, thus reducing the +per-task overhead. + + + +By default, Flink allows subtasks to share slots even if they are subtasks of +different tasks, so long as they are from the same job. The result is that one Review comment: ```suggestion different operators, so long as they are from the same job. The result is that one ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379558964 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured Review comment: I would remove this section completely. It tries to cover too much to early. There is a section about Statebackends where this can be explained in more detail. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379569140 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379553632 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual Review comment: This paragraph feels repetitive. Maybe mention that while the output of a stateless function only depends on the input, the output of a stateful function depends on the input as well as its current state. As the state is a function full history of events, the output of a stateful function can depend on the the input as well as all previous inputs. Maybe this is too theoretical or mathematica. It is a slightly different view on it though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379561027 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is Review comment: ```suggestion state is only possible on *keyed streams*, i.e. after a keyed partitioned data exchange, and is ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379574851 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379570828 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379567553 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md +%}) of streaming applications. + +Knowledge about the state also allows for rescaling Flink applications, meaning +that Flink takes care of redistributing state across parallel instances. + +The [queryable state]({{ site.baseurl }}{% link +dev/stream/state/queryable_state.md %}) feature of Flink allows you to access +state from outside of Flink during runtime. + +When working with state, it might also be useful to read about [Flink's state +backends]({{ site.baseurl }}{% link ops/state/state_backends.md %}). Flink +provides different state backends that specify how and where state is stored. +State can be located on Java's heap or off-heap. Depending on your state +backend, Flink can also *manage* the state for the application, meaning Flink +deals with the memory management (possibly spilling to disk if necessary) to +allow applications to hold very large state. State backends can be configured +without changing your application logic. + +* This will be replaced by the TOC +{:toc} + +## What is State? + +`TODO: expand this section` + +There are different types of state in Flink, the most-used type of state is +*Keyed State*. For special cases you can use *Operator State* and *Broadcast +State*. *Broadcast State* is a special type of *Operator State*. + +{% top %} + +## State in Stream & Batch Processing + +`TODO: What is this section about? Do we even need it?` + +{% top %} + +## Keyed State + +Keyed state is maintained in what can be thought of as an embedded key/value +store. The state is partitioned and distributed strictly together with the +streams that are read by the stateful operators. Hence, access to the key/value +state is only possible on *keyed streams*, after a *keyBy()* function, and is +restricted to the values associated with the current event's key. Aligning the +keys of streams and state makes sure that all state updates are local +operations, guaranteeing consistency without transaction overhead. This +alignment also allows Flink to redistribute the state and adjust the stream +partitioning transparently. + + + +Keyed State is further organized into so-called *Key Groups*. Key Groups are +the atomic unit by which Flink can redistribute Keyed State; there are exactly +as many Key Groups as the defined maximum parallelism. During execution each +parallel instance of a keyed operator works with the keys for one or more Key +Groups. + +`TODO: potentially leave out Operator State and Broadcast State from concepts documentation` + +## Operator State + +*Operator State* (or *non-keyed state*) is state that is is bound to one +parallel operator instance. The [Kafka Connector]({{ site.baseurl }}{% link +dev/connectors/kafka.md %}) is a good motivating example for the use of +Operator State in Flink. Each parallel instance of the Kafka consumer maintains +a map of topic partitions and offsets as its Operator State. + +The Operator State interfaces support redistributing state among parallel +operator instances when the parallelism is changed. There can be different +schemes for doing this redistribution. + +## Broadcast State + +*Broadcast State* is a special type of *Operator State*. It was introduced to +support use cases where some data coming from one stream is required to be +broadcasted to all downstream tasks, where it is stored locally and
[GitHub] [flink] knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
knaufk commented on a change in pull request #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#discussion_r379555751 ## File path: docs/concepts/stateful-stream-processing.md ## @@ -0,0 +1,412 @@ +--- +title: Stateful Stream Processing +nav-id: stateful-stream-processing +nav-pos: 2 +nav-title: Stateful Stream Processing +nav-parent_id: concepts +--- + + +While many operations in a dataflow simply look at one individual *event at a +time* (for example an event parser), some operations remember information +across multiple events (for example window operators). These operations are +called **stateful**. + +Stateful functions and operators store data across the processing of individual +elements/events, making state a critical building block for any type of more +elaborate operation. + +For example: + + - When an application searches for certain event patterns, the state will +store the sequence of events encountered so far. + - When aggregating events per minute/hour/day, the state holds the pending +aggregates. + - When training a machine learning model over a stream of data points, the +state holds the current version of the model parameters. + - When historic data needs to be managed, the state allows efficient access +to events that occurred in the past. + +Flink needs to be aware of the state in order to make state fault tolerant +using [checkpoints]({{ site.baseurl}}{% link dev/stream/state/checkpointing.md +%}) and to allow [savepoints]({{ site.baseurl }}{%link ops/state/savepoints.md Review comment: "Flink needs to be aware of the state in order to allow savepoints." does not make sense to me." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #11095: [FLINK-10918][rocks-db-backend] Fix incremental checkpoints on Windows.
flinkbot edited a comment on issue #11095: [FLINK-10918][rocks-db-backend] Fix incremental checkpoints on Windows. URL: https://github.com/apache/flink/pull/11095#issuecomment-586287248 ## CI report: * 1529c3a8abb5e2b758e6a80e2f7359288f87e3d7 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148979015) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5179) * 0412a4320a08135f4f6c1f580e6eb60b7f763551 Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/149011380) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5190) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests
flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012 ## CI report: * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181) * f65641913b8e78475a174d4c45fcf6720324e894 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/149020890) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5191) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] StephanEwen commented on issue #11095: [FLINK-10918][rocks-db-backend] Fix incremental checkpoints on Windows.
StephanEwen commented on issue #11095: [FLINK-10918][rocks-db-backend] Fix incremental checkpoints on Windows. URL: https://github.com/apache/flink/pull/11095#issuecomment-586405488 @carp84 @Myasuka @klion26 Let me know if you are interested in reviewing this. Otherwise I would merger this in the next days. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] StephanEwen commented on issue #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section
StephanEwen commented on issue #11092: [FLINK-15999] Extract “Concepts” material from API/Library sections and start proper concepts section URL: https://github.com/apache/flink/pull/11092#issuecomment-586403438 Thanks, this is a great first step! Is the plan to merge this with the TODO comments? (I am personally fine with that) Or is that a base PR on which you want to base the next ones? A few minor suggestions: - I would remove the "Programming Model (outdated)" section. - Instead, having a staring page (maybe for concepts, maybe for the docs as a whole) that show the stack (SQL/Table API / DataStream API / StateFun, those on top of the common runtime) - I would drop operator state from the introduction. Instead I would mention it in the `DataStream API` as something that is mainly targeted towards connectors. - What do you think about removing ingestion time from the "timely" docs? We could change it to just a short mention under "event time", or mention it not at all. - I would also remove the "Asynchronous State Snapshots" section and simply have a comment somewhere that all persistence operations are asynchronous. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-15956) Introduce an HttpRemoteFunction
[ https://issues.apache.org/jira/browse/FLINK-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037155#comment-17037155 ] Igal Shilman commented on FLINK-15956: -- The initial version of the Http remote function is implemented and merged to master (see the attached PR for details). > Introduce an HttpRemoteFunction > --- > > Key: FLINK-15956 > URL: https://issues.apache.org/jira/browse/FLINK-15956 > Project: Flink > Issue Type: Task > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Igal Shilman >Priority: Major > Labels: pull-request-available > Fix For: statefun-1.1 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue introduces the remote HttpFunction based on a multi language > protocol. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-15956) Introduce an HttpRemoteFunction
[ https://issues.apache.org/jira/browse/FLINK-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman closed FLINK-15956. Resolution: Fixed > Introduce an HttpRemoteFunction > --- > > Key: FLINK-15956 > URL: https://issues.apache.org/jira/browse/FLINK-15956 > Project: Flink > Issue Type: Task > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Igal Shilman >Priority: Major > Labels: pull-request-available > Fix For: statefun-1.1 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue introduces the remote HttpFunction based on a multi language > protocol. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on issue #11099: [FLINK-16038][jdbc] Add hitRate metric for JDBCLookupFunction
flinkbot edited a comment on issue #11099: [FLINK-16038][jdbc] Add hitRate metric for JDBCLookupFunction URL: https://github.com/apache/flink/pull/11099#issuecomment-586328258 ## CI report: * 09de912ae24b854486670dd6122ea62758055bae Travis: [SUCCESS](https://travis-ci.com/flink-ci/flink/builds/148994917) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5188) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests
flinkbot edited a comment on issue #11096: [FLINK-16056][runtime][tests] fix CFRO creation in tests URL: https://github.com/apache/flink/pull/11096#issuecomment-586300012 ## CI report: * 6f1f37ee8baceee6d08b4c4d2439d46f2c144688 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/148983706) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5181) * f65641913b8e78475a174d4c45fcf6720324e894 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-16066) StateBinder should pickup persisted values in inner class
[ https://issues.apache.org/jira/browse/FLINK-16066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman closed FLINK-16066. Fix Version/s: statefun-1.1 Resolution: Fixed Fixed at b81ec64ee72fb04ff8bf0fe2045fb92523959efe > StateBinder should pickup persisted values in inner class > - > > Key: FLINK-16066 > URL: https://issues.apache.org/jira/browse/FLINK-16066 > Project: Flink > Issue Type: Task > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Igal Shilman >Priority: Major > Labels: pull-request-available > Fix For: statefun-1.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently StateBinder would only bind PersistedValues/PersistedTables if they > are declared directly as a field in a StatefulFunction, but this prevents > reuse through composition. > Ideally if I user annotates a field as Persisted, StateBinder should recurse > into that class and pick up all the PersistedStates/PersistedTables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-16062) Remove the use of a checkpointLock
[ https://issues.apache.org/jira/browse/FLINK-16062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman closed FLINK-16062. Fix Version/s: statefun-1.1 Resolution: Fixed Fixed at 3ac3cfd95e4cebc7e7fce1d3bf9d9262be5cc0c4. > Remove the use of a checkpointLock > -- > > Key: FLINK-16062 > URL: https://issues.apache.org/jira/browse/FLINK-16062 > Project: Flink > Issue Type: Task > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Igal Shilman >Priority: Major > Labels: pull-request-available > Fix For: statefun-1.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Since Flink 1.10 the checkpoint lock is deprecated and useless. > Lets remove its usage from statefun. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-16061) Disable state multiplexing
[ https://issues.apache.org/jira/browse/FLINK-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman closed FLINK-16061. Fix Version/s: statefun-1.1 Resolution: Fixed Implemented at 11f88f588eb963c4dee2fe517c607093bcdb147b > Disable state multiplexing > -- > > Key: FLINK-16061 > URL: https://issues.apache.org/jira/browse/FLINK-16061 > Project: Flink > Issue Type: Task > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Igal Shilman >Priority: Major > Fix For: statefun-1.1 > > > Since Flink 1.10 the cost of a RocksDB column family has reduced > substantially, there is no more reason to multiplex state in a single state > handle. > So lets remove that code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-16020) Use Flink-1.10 released version instead of the snapshot version
[ https://issues.apache.org/jira/browse/FLINK-16020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman closed FLINK-16020. Fix Version/s: statefun-1.1 Resolution: Fixed Fixed at 9a1c1e8a3cbcec791161572401b8de8dca342c6c > Use Flink-1.10 released version instead of the snapshot version > --- > > Key: FLINK-16020 > URL: https://issues.apache.org/jira/browse/FLINK-16020 > Project: Flink > Issue Type: Task > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Igal Shilman >Priority: Major > Fix For: statefun-1.1 > > > Since Flink 1.10 was released, we can stop using Flink 1.10-SNAPSHOT, > this include both maven and docker. -- This message was sent by Atlassian Jira (v8.3.4#803005)