[
https://issues.apache.org/jira/browse/FLINK-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007657#comment-16007657
]
ramkrishna.s.vasudevan commented on FLINK-6284:
---
I can try to do this by end of today IST.
> Incorrect sorting of completed checkpoints in
> ZooKeeperCompletedCheckpointStore
> ---
>
> Key: FLINK-6284
> URL: https://issues.apache.org/jira/browse/FLINK-6284
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
>Reporter: Xiaogang Shi
>Priority: Blocker
> Fix For: 1.3.0
>
>
> Now all completed checkpoints are sorted in their paths when they are
> recovered in {{ZooKeeperCompletedCheckpointStore}} . In the cases where the
> latest checkpoint's id is not the largest in lexical order (e.g., "100" is
> smaller than "99" in lexical order), Flink will not recover from the latest
> completed checkpoint.
> The problem can be easily observed by setting the checkpoint ids in
> {{ZooKeeperCompletedCheckpointStoreITCase#testRecover()}} to be 99, 100 and
> 101.
> To fix the problem, we should explicitly sort found checkpoints in their
> checkpoint ids, without the usage of
> {{ZooKeeperStateHandleStore#getAllSortedByName()}}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007656#comment-16007656
]
Fang Yong commented on FLINK-6564:
--
Thank you for your report, and also I meet this problem in windows. I find that
the path is configed in pom.xml as follows:
${basedir}/target/classes/META-INF/licenses/
${basedir}/licenses
LICENSE.*.txt
Could the outputDirectory be renamed to another path such as
${basedir}/target/classes/META-INF/packaged_licenses/ to fix this problem?
[~StephanEwen]
> Build fails on file systems that do not distinguish between upper and lower
> case
>
>
> Key: FLINK-6564
> URL: https://issues.apache.org/jira/browse/FLINK-6564
> Project: Flink
> Issue Type: Bug
> Components: Build System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Priority: Blocker
> Fix For: 1.3.0, 1.4.0
>
>
> A recent change of the root pom tries to copy licenses of bundled
> dependencies into the {{META-INF/license}} folder. However, the {{META-INF}}
> folder contains already a file {{LICENSE}} which contains the AL2.
> File systems that do not distinguish upper and lower case are not able to
> create the {{license}} folder because the {{LICENSE}} file exists.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007598#comment-16007598
]
ASF GitHub Bot commented on FLINK-6504:
---
Github user shixiaogang commented on a diff in the pull request:
https://github.com/apache/flink/pull/3859#discussion_r116148624
--- Diff:
flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java
---
@@ -911,9 +915,11 @@ void releaseResources(boolean canceled) {
if (canceled) {
List statesToDiscard = new
ArrayList<>();
- statesToDiscard.add(metaStateHandle);
- statesToDiscard.addAll(miscFiles.values());
- statesToDiscard.addAll(newSstFiles.values());
+ synchronized (this) {
--- End diff --
Yes, i agree. The key point here is to make sure the stopping of the
materialization thread. Synchronization does little help here. So i prefer to
remove synchronization here, what do you think?
> Lack of synchronization on materializedSstFiles in RocksDBKEyedStateBackend
> ---
>
> Key: FLINK-6504
> URL: https://issues.apache.org/jira/browse/FLINK-6504
> Project: Flink
> Issue Type: Sub-task
> Components: State Backends, Checkpointing
>Affects Versions: 1.3.0
>Reporter: Stefan Richter
>Assignee: Xiaogang Shi
>Priority: Blocker
> Fix For: 1.3.0
>
>
> Concurrent checkpoints could access `materializedSstFiles` in the
> `RocksDBStateBackend` concurrently. This should be avoided.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user shixiaogang commented on a diff in the pull request:
https://github.com/apache/flink/pull/3859#discussion_r116148624
--- Diff:
flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java
---
@@ -911,9 +915,11 @@ void releaseResources(boolean canceled) {
if (canceled) {
List statesToDiscard = new
ArrayList<>();
- statesToDiscard.add(metaStateHandle);
- statesToDiscard.addAll(miscFiles.values());
- statesToDiscard.addAll(newSstFiles.values());
+ synchronized (this) {
--- End diff --
Yes, i agree. The key point here is to make sure the stopping of the
materialization thread. Synchronization does little help here. So i prefer to
remove synchronization here, what do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user sunjincheng121 commented on a diff in the pull request:
https://github.com/apache/flink/pull/3806#discussion_r116146470
--- Diff: docs/dev/table_api.md ---
@@ -1099,14 +1099,14 @@ Temporal intervals can be represented as number of
months (`Types.INTERVAL_MONTH
The Table API is a declarative API to define queries on batch and
streaming tables. Projection, selection, and union operations can be applied
both on streaming and batch tables without additional semantics. Aggregations
on (possibly) infinite streaming tables, however, can only be computed on
finite groups of records. Window aggregates group rows into finite groups based
on time or row-count intervals and evaluate aggregation functions once per
group. For batch tables, windows are a convenient shortcut to group records by
time intervals.
-Windows are defined using the `window(w: Window)` clause and require an
alias, which is specified using the `as` clause. In order to group a table by a
window, the window alias must be referenced in the `groupBy(...)` clause like a
regular grouping attribute.
+Windows are defined using the `window(w: Window)` clause and the window
must have an alias. In order to group a table by a window, the window alias
must be referenced in the `groupBy(...)` clause like a regular grouping
attribute.
The following example shows how to define a window aggregation on a table.
{% highlight java %}
Table table = input
- .window([Window w].as("w")) // define window with alias w
+ .window([WindowWithoutAlias w].as("w")) // define window with alias w
--- End diff --
Yes, But the `as` method does not belong to `Window`. I think the user may
has a little bit confused.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user sunjincheng121 commented on a diff in the pull request:
https://github.com/apache/flink/pull/3806#discussion_r116146239
--- Diff: docs/dev/table_api.md ---
@@ -1485,14 +1485,14 @@ Joins, set operations, and non-windowed
aggregations are not supported yet.
{% top %}
-### Group Windows
+### Windows
--- End diff --
I'd like called them `Windows`. All of the names are
`Tumble`,`Slide`,`Session` and `Over`. It's consistent with the interface of
the table API. Because we only have`window(...)` method in `Table.scala`.
although the param of `window(overWindows: OverWindow*)` and `window(window:
Window)` a little difference.
When user using the tableAPI/SQL they only notice that there are 4 kinds
of windows in FLINK.
They are `Tumble`, `Slide/Hup`, `Session` and `Over`. We'd better not
reflect the concept of group window. What do you think?
Thanks,
SunJincheng
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007577#comment-16007577
]
ASF GitHub Bot commented on FLINK-6426:
---
Github user sunjincheng121 commented on a diff in the pull request:
https://github.com/apache/flink/pull/3806#discussion_r116146239
--- Diff: docs/dev/table_api.md ---
@@ -1485,14 +1485,14 @@ Joins, set operations, and non-windowed
aggregations are not supported yet.
{% top %}
-### Group Windows
+### Windows
--- End diff --
I'd like called them `Windows`. All of the names are
`Tumble`,`Slide`,`Session` and `Over`. It's consistent with the interface of
the table API. Because we only have`window(...)` method in `Table.scala`.
although the param of `window(overWindows: OverWindow*)` and `window(window:
Window)` a little difference.
When user using the tableAPI/SQL they only notice that there are 4 kinds
of windows in FLINK.
They are `Tumble`, `Slide/Hup`, `Session` and `Over`. We'd better not
reflect the concept of group window. What do you think?
Thanks,
SunJincheng
> Update the document of group-window table API
> -
>
> Key: FLINK-6426
> URL: https://issues.apache.org/jira/browse/FLINK-6426
> Project: Flink
> Issue Type: Sub-task
> Components: Documentation, Table API & SQL
>Affects Versions: 1.3.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>
> 1.Correct the method parameter type error in the group-window table API
> document, change the document from ` .window([w: Window] as 'w)` to `
> .window([w: WindowWithoutAlias] as 'w)`
> 2. For the consistency of tableAPI and SQL, change the description of SQL
> document from "Group Windows" to "Windows".
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007580#comment-16007580
]
ASF GitHub Bot commented on FLINK-6426:
---
Github user sunjincheng121 commented on a diff in the pull request:
https://github.com/apache/flink/pull/3806#discussion_r116146470
--- Diff: docs/dev/table_api.md ---
@@ -1099,14 +1099,14 @@ Temporal intervals can be represented as number of
months (`Types.INTERVAL_MONTH
The Table API is a declarative API to define queries on batch and
streaming tables. Projection, selection, and union operations can be applied
both on streaming and batch tables without additional semantics. Aggregations
on (possibly) infinite streaming tables, however, can only be computed on
finite groups of records. Window aggregates group rows into finite groups based
on time or row-count intervals and evaluate aggregation functions once per
group. For batch tables, windows are a convenient shortcut to group records by
time intervals.
-Windows are defined using the `window(w: Window)` clause and require an
alias, which is specified using the `as` clause. In order to group a table by a
window, the window alias must be referenced in the `groupBy(...)` clause like a
regular grouping attribute.
+Windows are defined using the `window(w: Window)` clause and the window
must have an alias. In order to group a table by a window, the window alias
must be referenced in the `groupBy(...)` clause like a regular grouping
attribute.
The following example shows how to define a window aggregation on a table.
{% highlight java %}
Table table = input
- .window([Window w].as("w")) // define window with alias w
+ .window([WindowWithoutAlias w].as("w")) // define window with alias w
--- End diff --
Yes, But the `as` method does not belong to `Window`. I think the user may
has a little bit confused.
> Update the document of group-window table API
> -
>
> Key: FLINK-6426
> URL: https://issues.apache.org/jira/browse/FLINK-6426
> Project: Flink
> Issue Type: Sub-task
> Components: Documentation, Table API & SQL
>Affects Versions: 1.3.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>
> 1.Correct the method parameter type error in the group-window table API
> document, change the document from ` .window([w: Window] as 'w)` to `
> .window([w: WindowWithoutAlias] as 'w)`
> 2. For the consistency of tableAPI and SQL, change the description of SQL
> document from "Group Windows" to "Windows".
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007513#comment-16007513
]
ASF GitHub Bot commented on FLINK-6457:
---
GitHub user Xpray opened a pull request:
https://github.com/apache/flink/pull/3880
[FLINK-6457] Clean up ScalarFunction and TableFunction interface
Thanks for contributing to Apache Flink. Before you open your pull request,
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your
pull request. For more information and/or questions please refer to the [How To
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful
description of your changes.
- [x] General
- The pull request references the related JIRA issue ("[FLINK-XXX] Jira
title text")
- The pull request addresses only one issue
- Each commit in the PR has a meaningful commit message (including the
JIRA id)
- [ ] Documentation
- Documentation has been added for new functionality
- Old documentation affected by the pull request has been updated
- JavaDoc for public methods has been added
- [x] Tests & Build
- Functionality added by the pull request is covered by tests
- `mvn clean verify` has been executed successfully locally or a Travis
build has passed
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Xpray/flink FLINK-6457
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3880.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3880
commit c55d582227820110f7967ff8b39625deb2530f4d
Author: Xpray
Date: 2017-05-11T15:59:39Z
[FLINK-6457] Clean up ScalarFunction and TableFunction interface
> Clean up ScalarFunction and TableFunction interface
> ---
>
> Key: FLINK-6457
> URL: https://issues.apache.org/jira/browse/FLINK-6457
> Project: Flink
> Issue Type: Improvement
> Components: Table API & SQL
>Reporter: Ruidong Li
>Assignee: Ruidong Li
>
> Motivation:
> Some methods in ScalarFunction and TableFunction are unnecessary, e.g
> toString() and getResultType in ScalarFunction
> this issue intend to clear the interface.
> Goal:
> only methods related to `Collector` will remain in TableFunction interface,
> and ScalarFunction interface shall have no methods , user can choose whether
> to implement the `getResultType` method, which will be called by reflection,
> and the Flink document will have instructions for user.
> Future:
> There should be some Annotations for user to implement methods like `@Eval`
> for eval method, it be will in the next issue
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
GitHub user Xpray opened a pull request:
https://github.com/apache/flink/pull/3880
[FLINK-6457] Clean up ScalarFunction and TableFunction interface
Thanks for contributing to Apache Flink. Before you open your pull request,
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your
pull request. For more information and/or questions please refer to the [How To
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful
description of your changes.
- [x] General
- The pull request references the related JIRA issue ("[FLINK-XXX] Jira
title text")
- The pull request addresses only one issue
- Each commit in the PR has a meaningful commit message (including the
JIRA id)
- [ ] Documentation
- Documentation has been added for new functionality
- Old documentation affected by the pull request has been updated
- JavaDoc for public methods has been added
- [x] Tests & Build
- Functionality added by the pull request is covered by tests
- `mvn clean verify` has been executed successfully locally or a Travis
build has passed
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Xpray/flink FLINK-6457
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3880.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3880
commit c55d582227820110f7967ff8b39625deb2530f4d
Author: Xpray
Date: 2017-05-11T15:59:39Z
[FLINK-6457] Clean up ScalarFunction and TableFunction interface
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated FLINK-6310:
--
Description:
Here is related code:
{code}
public void endSession(JobID jobID) throws Exception {
synchronized (LocalExecutor.class) {
LocalFlinkMiniCluster flink = this.flink;
{code}
In other places, lock field is used for synchronization:
{code}
public void start() throws Exception {
synchronized (lock) {
{code}
was:
Here is related code:
{code}
public void endSession(JobID jobID) throws Exception {
synchronized (LocalExecutor.class) {
LocalFlinkMiniCluster flink = this.flink;
{code}
In other places, lock field is used for synchronization:
{code}
public void start() throws Exception {
synchronized (lock) {
{code}
> LocalExecutor#endSession() uses wrong lock for synchronization
> --
>
> Key: FLINK-6310
> URL: https://issues.apache.org/jira/browse/FLINK-6310
> Project: Flink
> Issue Type: Bug
> Components: Local Runtime
>Reporter: Ted Yu
>
> Here is related code:
> {code}
> public void endSession(JobID jobID) throws Exception {
> synchronized (LocalExecutor.class) {
> LocalFlinkMiniCluster flink = this.flink;
> {code}
> In other places, lock field is used for synchronization:
> {code}
> public void start() throws Exception {
> synchronized (lock) {
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Fabian Hueske created FLINK-6564:
Summary: Build fails on file systems that do not distinguish
between upper and lower case
Key: FLINK-6564
URL: https://issues.apache.org/jira/browse/FLINK-6564
Project: Flink
Issue Type: Bug
Components: Build System
Affects Versions: 1.3.0, 1.4.0
Reporter: Fabian Hueske
Priority: Blocker
Fix For: 1.3.0, 1.4.0
A recent change of the root pom tries to copy licenses of bundled dependencies
into the {{META-INF/license}} folder. However, the {{META-INF}} folder contains
already a file {{LICENSE}} which contains the AL2.
File systems that do not distinguish upper and lower case are not able to
create the {{license}} folder because the {{LICENSE}} file exists.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haohui Mai updated FLINK-6563:
--
Priority: Critical (was: Major)
Component/s: Table API & SQL
> Expose time indicator attributes in the KafkaTableSource
>
>
> Key: FLINK-6563
> URL: https://issues.apache.org/jira/browse/FLINK-6563
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Critical
>
> This is a follow up for FLINK-5884.
> After FLINK-5884 requires the {{TableSource}} interfaces to expose the
> processing time and the event time for the data stream. This jira proposes to
> expose these two information in the Kafka table source.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Haohui Mai created FLINK-6563:
-
Summary: Expose time indicator attributes in the KafkaTableSource
Key: FLINK-6563
URL: https://issues.apache.org/jira/browse/FLINK-6563
Project: Flink
Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
This is a follow up for FLINK-5884.
After FLINK-5884 requires the {{TableSource}} interfaces to expose the
processing time and the event time for the data stream. This jira proposes to
expose these two information in the Kafka table source.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007320#comment-16007320
]
ASF GitHub Bot commented on FLINK-6562:
---
GitHub user haohui opened a pull request:
https://github.com/apache/flink/pull/3879
[FLINK-6562] Support implicit table references for nested fields in SQL.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/haohui/flink FLINK-6562
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3879.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3879
commit 5ab697d32afa584a8683107006233889275ca73c
Author: Haohui Mai
Date: 2017-05-11T22:35:39Z
[FLINK-6562] Support implicit table references for nested fields in SQL.
> Support implicit table references for nested fields in SQL
> --
>
> Key: FLINK-6562
> URL: https://issues.apache.org/jira/browse/FLINK-6562
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently nested fields can only be accessed through fully qualified
> identifiers. For example, users need to specify the following query for the
> table {{f}} that has a nested field {{foo.bar}}
> {noformat}
> SELECT f.foo.bar FROM f
> {noformat}
> Other query engines like Hive / Presto supports implicit table references.
> For example:
> {noformat}
> SELECT foo.bar FROM f
> {noformat}
> This jira proposes to support the latter syntax in the SQL API.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
GitHub user haohui opened a pull request:
https://github.com/apache/flink/pull/3879
[FLINK-6562] Support implicit table references for nested fields in SQL.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/haohui/flink FLINK-6562
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3879.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3879
commit 5ab697d32afa584a8683107006233889275ca73c
Author: Haohui Mai
Date: 2017-05-11T22:35:39Z
[FLINK-6562] Support implicit table references for nested fields in SQL.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007234#comment-16007234
]
ASF GitHub Bot commented on FLINK-6483:
---
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/3862
Thanks for the update @twalthr!
Looks very good. Will merge this.
> Support time materialization
>
>
> Key: FLINK-6483
> URL: https://issues.apache.org/jira/browse/FLINK-6483
> Project: Flink
> Issue Type: New Feature
> Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> FLINK-5884 added support for time indicators. However, there are still some
> features missing i.e. materialization of metadata timestamp.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/3862
Thanks for the update @twalthr!
Looks very good. Will merge this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007215#comment-16007215
]
ASF GitHub Bot commented on FLINK-6221:
---
Github user hadronzoo commented on the issue:
https://github.com/apache/flink/pull/3833
@mbode thanks for working on this!
One thing that I've found useful when exporting Flink's statsd metrics to
Prometheus is to make several of the metric fields tags: like `job_name`,
`task_name`, `operator_name`, etc. This [statsd-exporter
mapping](https://gist.github.com/hadronzoo/621b6a6dce7e2596d5643ce8d1e954ea)
has tags that have worked well for me. I'm not tagging host names or IP
addresses because Prometheus's Kubernetes support does that already, but that
could be useful for people running standalone clusters.
> Add Prometheus support to metrics
> -
>
> Key: FLINK-6221
> URL: https://issues.apache.org/jira/browse/FLINK-6221
> Project: Flink
> Issue Type: Improvement
> Components: Metrics
>Affects Versions: 1.2.0
>Reporter: Joshua Griffith
>Assignee: Maximilian Bode
>Priority: Minor
>
> [Prometheus|https://prometheus.io/] is becoming popular for metrics and
> alerting. It's possible to use
> [statsd-exporter|https://github.com/prometheus/statsd_exporter] to load Flink
> metrics into Prometheus but it would be far easier if Flink supported
> Promethus as a metrics reporter. A [dropwizard
> client|https://github.com/prometheus/client_java/tree/master/simpleclient_dropwizard]
> exists that could be integrated into the existing metrics system.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user hadronzoo commented on the issue:
https://github.com/apache/flink/pull/3833
@mbode thanks for working on this!
One thing that I've found useful when exporting Flink's statsd metrics to
Prometheus is to make several of the metric fields tags: like `job_name`,
`task_name`, `operator_name`, etc. This [statsd-exporter
mapping](https://gist.github.com/hadronzoo/621b6a6dce7e2596d5643ce8d1e954ea)
has tags that have worked well for me. I'm not tagging host names or IP
addresses because Prometheus's Kubernetes support does that already, but that
could be useful for people running standalone clusters.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Haohui Mai created FLINK-6562:
-
Summary: Support implicit table references for nested fields in SQL
Key: FLINK-6562
URL: https://issues.apache.org/jira/browse/FLINK-6562
Project: Flink
Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Currently nested fields can only be accessed through fully qualified
identifiers. For example, users need to specify the following query for the
table {{f}} that has a nested field {{foo.bar}}
{noformat}
SELECT f.foo.bar FROM f
{noformat}
Other query engines like Hive / Presto supports implicit table references. For
example:
{noformat}
SELECT foo.bar FROM f
{noformat}
This jira proposes to support the latter syntax in the SQL API.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haohui Mai updated FLINK-6562:
--
Component/s: Table API & SQL
> Support implicit table references for nested fields in SQL
> --
>
> Key: FLINK-6562
> URL: https://issues.apache.org/jira/browse/FLINK-6562
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently nested fields can only be accessed through fully qualified
> identifiers. For example, users need to specify the following query for the
> table {{f}} that has a nested field {{foo.bar}}
> {noformat}
> SELECT f.foo.bar FROM f
> {noformat}
> Other query engines like Hive / Presto supports implicit table references.
> For example:
> {noformat}
> SELECT foo.bar FROM f
> {noformat}
> This jira proposes to support the latter syntax in the SQL API.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user haohui commented on a diff in the pull request:
https://github.com/apache/flink/pull/3748#discussion_r116092436
--- Diff:
flink-connectors/flink-connector-cassandra/src/test/java/org/apache/flink/streaming/connectors/cassandra/CassandraConnectorITCase.java
---
@@ -438,6 +462,45 @@ public void cancel() {
}
@Test
+ public void testCassandraTableSink() throws Exception {
+ StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
+ env.setParallelism(1);
+
+ DataStreamSource source =
env.fromCollection(rowCollection);
+ CassandraTableSink cassandraTableSink = new
CassandraTableSink(new ClusterBuilder() {
+ @Override
+ protected Cluster buildCluster(Cluster.Builder builder)
{
+ return builder.addContactPointsWithPorts(new
InetSocketAddress(HOST, PORT)).build();
+ }
+ }, INSERT_DATA_QUERY, new Properties());
+ CassandraTableSink newCassandrTableSink =
cassandraTableSink.configure(FIELD_NAMES, FIELD_TYPES);
+
+ newCassandrTableSink.emitDataStream(source);
+
+ env.execute();
+ ResultSet rs = session.execute(SELECT_DATA_QUERY);
+ Assert.assertEquals(20, rs.all().size());
+ }
+
+ @Test
+ public void testCassandraTableSinkE2E() throws Exception {
+ StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
+ env.setParallelism(4);
+ StreamTableEnvironment tEnv =
StreamTableEnvironment.getTableEnvironment(env);
--- End diff --
Agree. I think you're right. @PangZhi can you please make the suggested
changes?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007119#comment-16007119
]
Robert Metzger commented on FLINK-6284:
---
I think nobody is working on the issue right now.
How fast would you be able to provide a pull request to fix this?
I'm asking because this is one of the last blockers of the 1.3.0 release, and I
would like to create the next release candidate on Monday morning (CET).
> Incorrect sorting of completed checkpoints in
> ZooKeeperCompletedCheckpointStore
> ---
>
> Key: FLINK-6284
> URL: https://issues.apache.org/jira/browse/FLINK-6284
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
>Reporter: Xiaogang Shi
>Priority: Blocker
> Fix For: 1.3.0
>
>
> Now all completed checkpoints are sorted in their paths when they are
> recovered in {{ZooKeeperCompletedCheckpointStore}} . In the cases where the
> latest checkpoint's id is not the largest in lexical order (e.g., "100" is
> smaller than "99" in lexical order), Flink will not recover from the latest
> completed checkpoint.
> The problem can be easily observed by setting the checkpoint ids in
> {{ZooKeeperCompletedCheckpointStoreITCase#testRecover()}} to be 99, 100 and
> 101.
> To fix the problem, we should explicitly sort found checkpoints in their
> checkpoint ids, without the usage of
> {{ZooKeeperStateHandleStore#getAllSortedByName()}}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007081#comment-16007081
]
ASF GitHub Bot commented on FLINK-6561:
---
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/3878
Merging.
> GlobFilePathFilterTest#testExcludeFilenameWithStart fails on Windows
>
>
> Key: FLINK-6561
> URL: https://issues.apache.org/jira/browse/FLINK-6561
> Project: Flink
> Issue Type: Bug
> Components: Tests
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> The test fails because it verifies that a file containing an asterisk {{*}}
> is still properly filtered. This character however is not allowed in file
> names, causing an exception when the nio Path is generated.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/3874
Merging.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007080#comment-16007080
]
ASF GitHub Bot commented on FLINK-6558:
---
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/3874
Merging.
> Yarn tests fail on Windows
> --
>
> Key: FLINK-6558
> URL: https://issues.apache.org/jira/browse/FLINK-6558
> Project: Flink
> Issue Type: Improvement
> Components: Tests, YARN
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> The following yarn tests fail on Windows since they try to start-up an HDFS
> cluster. This requires the Windows hadoop extensions which we can't ship, so
> these tests should be disabled on Windows.
> {code}
> YarnIntraNonHaMasterServicesTest
> YarnPreConfiguredMasterHaServicesTest
> YarnApplicationMasterRunnerTest
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/3878
Merging.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007074#comment-16007074
]
ASF GitHub Bot commented on FLINK-6561:
---
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3878
Looks good, +1 to merge
> GlobFilePathFilterTest#testExcludeFilenameWithStart fails on Windows
>
>
> Key: FLINK-6561
> URL: https://issues.apache.org/jira/browse/FLINK-6561
> Project: Flink
> Issue Type: Bug
> Components: Tests
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> The test fails because it verifies that a file containing an asterisk {{*}}
> is still properly filtered. This character however is not allowed in file
> names, causing an exception when the nio Path is generated.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3878
Looks good, +1 to merge
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007068#comment-16007068
]
ASF GitHub Bot commented on FLINK-6561:
---
GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/3878
[FLINK-6561] Disable glob test on Windows
This PR disables a test case in the `GlobFilePathFilterTest`. The test
verified that a file name containing asterisks `(*)` was properly filtered out;
this however can't succeed on Windows since asterisks aren't allowed in file
names in the first place. The creation of a nio path for such a file name
throws an exception.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 6561_glog_test
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3878.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3878
commit f75edf840a1fb8633543518449f9373e40598001
Author: zentol
Date: 2017-05-11T19:24:02Z
[FLINK-6561] Disable glob test on Windows
> GlobFilePathFilterTest#testExcludeFilenameWithStart fails on Windows
>
>
> Key: FLINK-6561
> URL: https://issues.apache.org/jira/browse/FLINK-6561
> Project: Flink
> Issue Type: Bug
> Components: Tests
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> The test fails because it verifies that a file containing an asterisk {{*}}
> is still properly filtered. This character however is not allowed in file
> names, causing an exception when the nio Path is generated.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/3878
[FLINK-6561] Disable glob test on Windows
This PR disables a test case in the `GlobFilePathFilterTest`. The test
verified that a file name containing asterisks `(*)` was properly filtered out;
this however can't succeed on Windows since asterisks aren't allowed in file
names in the first place. The creation of a nio path for such a file name
throws an exception.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 6561_glog_test
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3878.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3878
commit f75edf840a1fb8633543518449f9373e40598001
Author: zentol
Date: 2017-05-11T19:24:02Z
[FLINK-6561] Disable glob test on Windows
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007050#comment-16007050
]
ASF GitHub Bot commented on FLINK-6514:
---
GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/3877
[backport] [FLINK-6514] [build] Create a proper separate Hadoop uber jar
for 'flink-dist' assembly
Backport of #3876 to `release-1.3`
This fixes the issue that Flink cannot be started locally if built with
Maven 3.3+
There are two big fixes in this pull request, because they do not
build/pass tests individually. The wrong Mesos dependencies where the reason
that the broken Hadoop fat jar building actually passed the Yarn tests.
# Hadoop Uber Jar
- This builds a proper Hadoop Uber Jar with all of Hadoop's needed
dependencies. The prior build was missing many important dependencies in the
Hadoop Uber Jar.
- The Hadoop-jar is no longer excluded in `flink-dist` via setting the
dependency to `provided`, but by explicit exclusion. That way, Hadoop's
transitive dependencies are not excluded from other dependencies as well.
Before this patch, various decompression and Avro were broken in a Flink build,
due to accidental exclusion of their transitive dependencies.
# Dependency fixing
- This also fixes the dependencies of `flink-mesos`, which made all of
Hadoop's transitive dependencies its own dependencies, by promoting them during
shading. That way, Flink had various unnecessary dependencies in its
`flink-dist` jar.
- Incidentally, that brought Hadoop's accidentally excluded dependencies
back in, but into the `flink-dist` jar, not the `shaded-hadoop2` jar.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink fix_fat_jar_13
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3877.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3877
commit ac15bc32a3d786b50e8864a903d31d0b3e0c3042
Author: Stephan Ewen
Date: 2017-05-11T13:12:10Z
[hotfix] [build] Drop transitive jersey/jettison/servlet dependencies
pulled via Hadoop
commit 5e94787ce4e14f5da88e71418200b4bbe517483b
Author: Stephan Ewen
Date: 2017-05-11T13:12:50Z
[FLINK-6546] [build] Fix dependencies of flink-mesos
- This makes all flink-related dependencies 'provided' to not have the
transitive dependencies promoted
- Drops the unnecessary dependency on the Hadoop artifact
- Adds directly referenced libraries, like jackson
- Deactivates default logging of tests
commit b568ccfdf7366056d29ee43d14c606cfc4448bab
Author: Stephan Ewen
Date: 2017-05-11T15:32:03Z
[build] Reduce flink-avro's compile dependency from 'flink-java' to
'flink-core'
commit 84c150a5798f029bb9aced998ad6b81dd8dc8de5
Author: Stephan Ewen
Date: 2017-05-11T15:35:43Z
[FLINK-6514] [build] Remove 'flink-shaded-hadoop2' from 'flink-dist' via
exclusions
This is more tedious/manual than setting it to 'provided' once, but it
is also correct.
For example, in the case of Hadoop 2.3, having 'flink-shaded-hadoop2' as
'provided'
removes other needed dependencies as well, such as 'org.codehaus.jackson'
from avro.
commit 99658870865c15ad0996066cf94c721f30bc86ca
Author: Stephan Ewen
Date: 2017-05-11T11:52:25Z
[FLINK-6514] [build] Merge bin and lib assembly
commit 93e37c666aba50988a48b9273d7b531434c5d5b1
Author: Stephan Ewen
Date: 2017-05-11T15:00:03Z
[FLINK-6514] [build] Create a proper separate Hadoop uber jar for
'flink-dist' assembly
> Cannot start Flink Cluster in standalone mode
> -
>
> Key: FLINK-6514
> URL: https://issues.apache.org/jira/browse/FLINK-6514
> Project: Flink
> Issue Type: Bug
> Components: Build System, Cluster Management
>Reporter: Aljoscha Krettek
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.3.0
>
>
> The changes introduced for FLINK-5998 change what goes into the
> {{flink-dost}} fat jar. As it is, this means that trying to start a cluster
> results in a {{ClassNotFoundException}} of {{LogFactory}} in
> {{commons-logging}}.
> One solution is to now make the shaded Hadoop jar a proper fat-jar, so that
> we again have all the dependencies.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/3877
[backport] [FLINK-6514] [build] Create a proper separate Hadoop uber jar
for 'flink-dist' assembly
Backport of #3876 to `release-1.3`
This fixes the issue that Flink cannot be started locally if built with
Maven 3.3+
There are two big fixes in this pull request, because they do not
build/pass tests individually. The wrong Mesos dependencies where the reason
that the broken Hadoop fat jar building actually passed the Yarn tests.
# Hadoop Uber Jar
- This builds a proper Hadoop Uber Jar with all of Hadoop's needed
dependencies. The prior build was missing many important dependencies in the
Hadoop Uber Jar.
- The Hadoop-jar is no longer excluded in `flink-dist` via setting the
dependency to `provided`, but by explicit exclusion. That way, Hadoop's
transitive dependencies are not excluded from other dependencies as well.
Before this patch, various decompression and Avro were broken in a Flink build,
due to accidental exclusion of their transitive dependencies.
# Dependency fixing
- This also fixes the dependencies of `flink-mesos`, which made all of
Hadoop's transitive dependencies its own dependencies, by promoting them during
shading. That way, Flink had various unnecessary dependencies in its
`flink-dist` jar.
- Incidentally, that brought Hadoop's accidentally excluded dependencies
back in, but into the `flink-dist` jar, not the `shaded-hadoop2` jar.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink fix_fat_jar_13
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3877.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3877
commit ac15bc32a3d786b50e8864a903d31d0b3e0c3042
Author: Stephan Ewen
Date: 2017-05-11T13:12:10Z
[hotfix] [build] Drop transitive jersey/jettison/servlet dependencies
pulled via Hadoop
commit 5e94787ce4e14f5da88e71418200b4bbe517483b
Author: Stephan Ewen
Date: 2017-05-11T13:12:50Z
[FLINK-6546] [build] Fix dependencies of flink-mesos
- This makes all flink-related dependencies 'provided' to not have the
transitive dependencies promoted
- Drops the unnecessary dependency on the Hadoop artifact
- Adds directly referenced libraries, like jackson
- Deactivates default logging of tests
commit b568ccfdf7366056d29ee43d14c606cfc4448bab
Author: Stephan Ewen
Date: 2017-05-11T15:32:03Z
[build] Reduce flink-avro's compile dependency from 'flink-java' to
'flink-core'
commit 84c150a5798f029bb9aced998ad6b81dd8dc8de5
Author: Stephan Ewen
Date: 2017-05-11T15:35:43Z
[FLINK-6514] [build] Remove 'flink-shaded-hadoop2' from 'flink-dist' via
exclusions
This is more tedious/manual than setting it to 'provided' once, but it
is also correct.
For example, in the case of Hadoop 2.3, having 'flink-shaded-hadoop2' as
'provided'
removes other needed dependencies as well, such as 'org.codehaus.jackson'
from avro.
commit 99658870865c15ad0996066cf94c721f30bc86ca
Author: Stephan Ewen
Date: 2017-05-11T11:52:25Z
[FLINK-6514] [build] Merge bin and lib assembly
commit 93e37c666aba50988a48b9273d7b531434c5d5b1
Author: Stephan Ewen
Date: 2017-05-11T15:00:03Z
[FLINK-6514] [build] Create a proper separate Hadoop uber jar for
'flink-dist' assembly
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007041#comment-16007041
]
ASF GitHub Bot commented on FLINK-6514:
---
GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/3876
[FLINK-6514] [build] Create a proper separate Hadoop uber jar for
'flink-dist' assembly
This fixes the issue that Flink cannot be started locally if built with
Maven 3.3+
There are two big fixes in this pull request, because they do not
build/pass tests individually. The wrong Mesos dependencies where the reason
that the broken Hadoop fat jar building actually passed the Yarn tests.
# Hadoop Uber Jar
- This builds a proper Hadoop Uber Jar with all of Hadoop's needed
dependencies. The prior build was missing many important dependencies in the
Hadoop Uber Jar.
- The Hadoop-jar is no longer excluded in `flink-dist` via setting the
dependency to `provided`, but by explicit exclusion. That way, Hadoop's
transitive dependencies are not excluded from other dependencies as well.
Before this patch, various decompression and Avro were broken in a Flink build,
due to accidental exclusion of their transitive dependencies.
# Dependency fixing
- This also fixes the dependencies of `flink-mesos`, which made all of
Hadoop's transitive dependencies its own dependencies, by promoting them during
shading. That way, Flink had various unnecessary dependencies in its
`flink-dist` jar.
- Incidentally, that brought Hadoop's accidentally excluded dependencies
back in, but into the `flink-dist` jar, not the `shaded-hadoop2` jar.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink fix_fat_jar
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3876.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3876
commit 86803d23dfdc56f2be64274a6a90a76c1e782f08
Author: Stephan Ewen
Date: 2017-05-11T13:12:50Z
[FLINK-6546] [build] Fix dependencies of flink-mesos
- This makes all flink-related dependencies 'provided' to not have the
transitive dependencies promoted
- Drops the unnecessary dependency on the Hadoop artifact
- Adds directly referenced libraries, like jackson
- Deactivates default logging of tests
commit a14137c109e73738a3d1f89a3d99e2fd2a799219
Author: Stephan Ewen
Date: 2017-05-11T15:32:03Z
[build] Reduce flink-avro's compile dependency from 'flink-java' to
'flink-core'
commit 32e8574498d7963e2ab58f1530b41a6853f23601
Author: Stephan Ewen
Date: 2017-05-11T15:35:43Z
[FLINK-6514] [build] Remove 'flink-shaded-hadoop2' from 'flink-dist' via
exclusions
This is more tedious/manual than setting it to 'provided' once, but it
is also correct.
For example, in the case of Hadoop 2.3, having 'flink-shaded-hadoop2' as
'provided'
removes other needed dependencies as well, such as 'org.codehaus.jackson'
from avro.
commit efaed902a78c6a7f236e0dad4f72ed7ae8bad1c0
Author: Stephan Ewen
Date: 2017-05-11T11:52:25Z
[FLINK-6514] [build] Merge bin and lib assembly
commit df8efd5cdadaeef9323472f20871871c94d14af5
Author: Stephan Ewen
Date: 2017-05-11T15:00:03Z
[FLINK-6514] [build] Create a proper separate Hadoop uber jar for
'flink-dist' assembly
> Cannot start Flink Cluster in standalone mode
> -
>
> Key: FLINK-6514
> URL: https://issues.apache.org/jira/browse/FLINK-6514
> Project: Flink
> Issue Type: Bug
> Components: Build System, Cluster Management
>Reporter: Aljoscha Krettek
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.3.0
>
>
> The changes introduced for FLINK-5998 change what goes into the
> {{flink-dost}} fat jar. As it is, this means that trying to start a cluster
> results in a {{ClassNotFoundException}} of {{LogFactory}} in
> {{commons-logging}}.
> One solution is to now make the shaded Hadoop jar a proper fat-jar, so that
> we again have all the dependencies.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/3876
[FLINK-6514] [build] Create a proper separate Hadoop uber jar for
'flink-dist' assembly
This fixes the issue that Flink cannot be started locally if built with
Maven 3.3+
There are two big fixes in this pull request, because they do not
build/pass tests individually. The wrong Mesos dependencies where the reason
that the broken Hadoop fat jar building actually passed the Yarn tests.
# Hadoop Uber Jar
- This builds a proper Hadoop Uber Jar with all of Hadoop's needed
dependencies. The prior build was missing many important dependencies in the
Hadoop Uber Jar.
- The Hadoop-jar is no longer excluded in `flink-dist` via setting the
dependency to `provided`, but by explicit exclusion. That way, Hadoop's
transitive dependencies are not excluded from other dependencies as well.
Before this patch, various decompression and Avro were broken in a Flink build,
due to accidental exclusion of their transitive dependencies.
# Dependency fixing
- This also fixes the dependencies of `flink-mesos`, which made all of
Hadoop's transitive dependencies its own dependencies, by promoting them during
shading. That way, Flink had various unnecessary dependencies in its
`flink-dist` jar.
- Incidentally, that brought Hadoop's accidentally excluded dependencies
back in, but into the `flink-dist` jar, not the `shaded-hadoop2` jar.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink fix_fat_jar
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3876.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3876
commit 86803d23dfdc56f2be64274a6a90a76c1e782f08
Author: Stephan Ewen
Date: 2017-05-11T13:12:50Z
[FLINK-6546] [build] Fix dependencies of flink-mesos
- This makes all flink-related dependencies 'provided' to not have the
transitive dependencies promoted
- Drops the unnecessary dependency on the Hadoop artifact
- Adds directly referenced libraries, like jackson
- Deactivates default logging of tests
commit a14137c109e73738a3d1f89a3d99e2fd2a799219
Author: Stephan Ewen
Date: 2017-05-11T15:32:03Z
[build] Reduce flink-avro's compile dependency from 'flink-java' to
'flink-core'
commit 32e8574498d7963e2ab58f1530b41a6853f23601
Author: Stephan Ewen
Date: 2017-05-11T15:35:43Z
[FLINK-6514] [build] Remove 'flink-shaded-hadoop2' from 'flink-dist' via
exclusions
This is more tedious/manual than setting it to 'provided' once, but it
is also correct.
For example, in the case of Hadoop 2.3, having 'flink-shaded-hadoop2' as
'provided'
removes other needed dependencies as well, such as 'org.codehaus.jackson'
from avro.
commit efaed902a78c6a7f236e0dad4f72ed7ae8bad1c0
Author: Stephan Ewen
Date: 2017-05-11T11:52:25Z
[FLINK-6514] [build] Merge bin and lib assembly
commit df8efd5cdadaeef9323472f20871871c94d14af5
Author: Stephan Ewen
Date: 2017-05-11T15:00:03Z
[FLINK-6514] [build] Create a proper separate Hadoop uber jar for
'flink-dist' assembly
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Chesnay Schepler created FLINK-6561:
---
Summary: GlobFilePathFilterTest#testExcludeFilenameWithStart fails
on Windows
Key: FLINK-6561
URL: https://issues.apache.org/jira/browse/FLINK-6561
Project: Flink
Issue Type: Bug
Components: Tests
Affects Versions: 1.3.0, 1.4.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
Priority: Trivial
The test fails because it verifies that a file containing an asterisk {{*}} is
still properly filtered. This character however is not allowed in file names,
causing an exception when the nio Path is generated.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007026#comment-16007026
]
Greg Hogan commented on FLINK-6560:
---
I had thought {{flink-tests}} was taking longer but noticed a build with
{{forkCount=1}} where top was at 100% and only a few seconds slower.
> Restore maven parallelism in flink-tests
>
>
> Key: FLINK-6560
> URL: https://issues.apache.org/jira/browse/FLINK-6560
> Project: Flink
> Issue Type: Bug
> Components: Build System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Critical
> Fix For: 1.3.0, 1.4.0
>
>
> FLINK-6506 added the maven variable {{flink.forkCountTestPackage}} which is
> used by the TravisCI script but no default value is set.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007021#comment-16007021
]
ASF GitHub Bot commented on FLINK-6560:
---
GitHub user greghogan opened a pull request:
https://github.com/apache/flink/pull/3875
[FLINK-6560] [build] Restore maven parallelism in flink-tests
Configure a default value for Maven variable 'flink.forkCountTestPackage'.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/greghogan/flink
6560_restore_maven_parallelism_in_flink_tests
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3875.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3875
commit 66e1a3144546680f63a81e3ef93823fddb512996
Author: Greg Hogan
Date: 2017-05-11T17:44:20Z
[FLINK-6560] [build] Restore maven parallelism in flink-tests
Configure a default value for Maven variable 'flink.forkCountTestPackage'.
> Restore maven parallelism in flink-tests
>
>
> Key: FLINK-6560
> URL: https://issues.apache.org/jira/browse/FLINK-6560
> Project: Flink
> Issue Type: Bug
> Components: Build System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Critical
> Fix For: 1.3.0, 1.4.0
>
>
> FLINK-6506 added the maven variable {{flink.forkCountTestPackage}} which is
> used by the TravisCI script but no default value is set.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chesnay Schepler closed FLINK-6553.
---
Resolution: Not A Problem
Oh boy, i just forgot the generics on the original DataStream...
> Calls to getSideOutput() result in warnings regarding unchecked assignments
> ---
>
> Key: FLINK-6553
> URL: https://issues.apache.org/jira/browse/FLINK-6553
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
>Reporter: Chesnay Schepler
>
> Consider the following code:
> {code}
> operator = ...
> operator.getSideOutput(tag).map(MapFunction...)
> {code}
> When writing this you will get an unchecked assignment warning. You also
> can't let the IDE auto-generate the signature of the MapFunction, it will be
> typed to objects.
> It is also odd that the following actually compiles, but that may be a java
> issue:
> {code}
> operator.getSideOutput(new OutputTag(){})
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
GitHub user greghogan opened a pull request:
https://github.com/apache/flink/pull/3875
[FLINK-6560] [build] Restore maven parallelism in flink-tests
Configure a default value for Maven variable 'flink.forkCountTestPackage'.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/greghogan/flink
6560_restore_maven_parallelism_in_flink_tests
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3875.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3875
commit 66e1a3144546680f63a81e3ef93823fddb512996
Author: Greg Hogan
Date: 2017-05-11T17:44:20Z
[FLINK-6560] [build] Restore maven parallelism in flink-tests
Configure a default value for Maven variable 'flink.forkCountTestPackage'.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006995#comment-16006995
]
Stefan Richter commented on FLINK-5053:
---
I have opened this jira to track everything that should still be fixed before
the release: https://issues.apache.org/jira/browse/FLINK-6537. Most of the work
is already done in https://github.com/apache/flink/pull/3870.
> Incremental / lightweight snapshots for checkpoints
> ---
>
> Key: FLINK-5053
> URL: https://issues.apache.org/jira/browse/FLINK-5053
> Project: Flink
> Issue Type: Improvement
> Components: State Backends, Checkpointing
>Reporter: Stefan Richter
>Assignee: Xiaogang Shi
>
> There is currently basically no difference between savepoints and checkpoints
> in Flink and both are created through exactly the same process.
> However, savepoints and checkpoints have a slightly different meaning which
> we should take into account to keep Flink efficient:
> - Savepoints are (typically infrequently) triggered by the user to create a
> state from which the application can be restarted, e.g. because Flink, some
> code, or the parallelism needs to be changed.
> - Checkpoints are (typically frequently) triggered by the System to allow for
> fast recovery in case of failure, but keeping the job/system unchanged.
> This means that savepoints and checkpoints can have different properties in
> that:
> - Savepoint should represent a state of the application, where
> characteristics of the job (e.g. parallelism) can be adjusted for the next
> restart. One example for things that savepoints need to be aware of are
> key-groups. Savepoints can potentially be a little more expensive than
> checkpoints, because they are usually created a lot less frequently through
> the user.
> - Checkpoints are frequently triggered by the system to allow for fast
> failure recovery. However, failure recovery leaves all characteristics of the
> job unchanged. This checkpoints do not have to be aware of those, e.g. think
> again of key groups. Checkpoints should run faster than creating savepoints,
> in particular it would be nice to have incremental checkpoints.
> For a first approach, I would suggest the following steps/changes:
> - In checkpoint coordination: differentiate between triggering checkpoints
> and savepoints. Introduce properties for checkpoints that describe their set
> of abilities, e.g. "is-key-group-aware", "is-incremental".
> - In state handle infrastructure: introduce state handles that reflect
> incremental checkpoints and drop full key-group awareness, i.e. covering
> folders instead of files and not having keygroup_id -> file/offset mapping,
> but keygroup_range -> folder?
> - Backend side: We should start with RocksDB by reintroducing something
> similar to semi-async snapshots, but using
> BackupableDBOptions::setShareTableFiles(true) and transferring only new
> incremental outputs to HDFS. Notice that using RocksDB's internal backup
> mechanism is giving up on the information about individual key-groups. But as
> explained above, this should be totally acceptable for checkpoints, while
> savepoints should use the key-group-aware fully async mode. Of course we also
> need to implement the ability to restore from both types of snapshots.
> One problem in the suggested approach is still that even checkpoints should
> support scale-down, in case that only a smaller number of instances is left
> available in a recovery case.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006987#comment-16006987
]
vishnu viswanath commented on FLINK-5053:
-
is this task complete, I see all the pull requests for the subtasks are merged.
If the work is complete, I would like to try this out :). Have been waiting for
Incremental checkpoint for one of the tasks that I am working on.
> Incremental / lightweight snapshots for checkpoints
> ---
>
> Key: FLINK-5053
> URL: https://issues.apache.org/jira/browse/FLINK-5053
> Project: Flink
> Issue Type: Improvement
> Components: State Backends, Checkpointing
>Reporter: Stefan Richter
>Assignee: Xiaogang Shi
>
> There is currently basically no difference between savepoints and checkpoints
> in Flink and both are created through exactly the same process.
> However, savepoints and checkpoints have a slightly different meaning which
> we should take into account to keep Flink efficient:
> - Savepoints are (typically infrequently) triggered by the user to create a
> state from which the application can be restarted, e.g. because Flink, some
> code, or the parallelism needs to be changed.
> - Checkpoints are (typically frequently) triggered by the System to allow for
> fast recovery in case of failure, but keeping the job/system unchanged.
> This means that savepoints and checkpoints can have different properties in
> that:
> - Savepoint should represent a state of the application, where
> characteristics of the job (e.g. parallelism) can be adjusted for the next
> restart. One example for things that savepoints need to be aware of are
> key-groups. Savepoints can potentially be a little more expensive than
> checkpoints, because they are usually created a lot less frequently through
> the user.
> - Checkpoints are frequently triggered by the system to allow for fast
> failure recovery. However, failure recovery leaves all characteristics of the
> job unchanged. This checkpoints do not have to be aware of those, e.g. think
> again of key groups. Checkpoints should run faster than creating savepoints,
> in particular it would be nice to have incremental checkpoints.
> For a first approach, I would suggest the following steps/changes:
> - In checkpoint coordination: differentiate between triggering checkpoints
> and savepoints. Introduce properties for checkpoints that describe their set
> of abilities, e.g. "is-key-group-aware", "is-incremental".
> - In state handle infrastructure: introduce state handles that reflect
> incremental checkpoints and drop full key-group awareness, i.e. covering
> folders instead of files and not having keygroup_id -> file/offset mapping,
> but keygroup_range -> folder?
> - Backend side: We should start with RocksDB by reintroducing something
> similar to semi-async snapshots, but using
> BackupableDBOptions::setShareTableFiles(true) and transferring only new
> incremental outputs to HDFS. Notice that using RocksDB's internal backup
> mechanism is giving up on the information about individual key-groups. But as
> explained above, this should be totally acceptable for checkpoints, while
> savepoints should use the key-group-aware fully async mode. Of course we also
> need to implement the ability to restore from both types of snapshots.
> One problem in the suggested approach is still that even checkpoints should
> support scale-down, in case that only a smaller number of instances is left
> available in a recovery case.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen closed FLINK-6414.
---
> Use scala.binary.version in place of change-scala-version.sh
>
>
> Key: FLINK-6414
> URL: https://issues.apache.org/jira/browse/FLINK-6414
> Project: Flink
> Issue Type: Improvement
> Components: Build System
>Affects Versions: 1.3.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.4.0
>
>
> Recent commits have failed to modify {{change-scala-version.sh}} resulting in
> broken builds for {{scala-2.11}}. It looks like we can remove the need for
> this script by replacing hard-coded references to the Scala version with
> Flink's maven variable {{scala.binary.version}}.
> I had initially realized that the change script is [only used for
> building|https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#scala-versions]
> and not for switching the IDE environment.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen resolved FLINK-6414.
-
Resolution: Fixed
Fix Version/s: 1.4.0
Fixed via 35c087129e2a27c2db47c5ed5ce3fb3523a7c719
> Use scala.binary.version in place of change-scala-version.sh
>
>
> Key: FLINK-6414
> URL: https://issues.apache.org/jira/browse/FLINK-6414
> Project: Flink
> Issue Type: Improvement
> Components: Build System
>Affects Versions: 1.3.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.4.0
>
>
> Recent commits have failed to modify {{change-scala-version.sh}} resulting in
> broken builds for {{scala-2.11}}. It looks like we can remove the need for
> this script by replacing hard-coded references to the Scala version with
> Flink's maven variable {{scala.binary.version}}.
> I had initially realized that the change script is [only used for
> building|https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#scala-versions]
> and not for switching the IDE environment.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006960#comment-16006960
]
ASF GitHub Bot commented on FLINK-6466:
---
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3832
+1 for 1.4 - just spent some good time cleaning up dependencies (also from
Hadoop) and this will almost certainly require to do another such pass...
> Build Hadoop 2.8.0 convenience binaries
> ---
>
> Key: FLINK-6466
> URL: https://issues.apache.org/jira/browse/FLINK-6466
> Project: Flink
> Issue Type: New Feature
> Components: Build System
>Affects Versions: 1.3.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.3.0
>
>
> As discussed on the dev mailing list, add Hadoop 2.8 to the
> {{create_release_files.sh}} script and TravisCI test matrix.
> If there is consensus then references to binaries for old versions of Hadoop
> could be removed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3832
+1 for 1.4 - just spent some good time cleaning up dependencies (also from
Hadoop) and this will almost certainly require to do another such pass...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006955#comment-16006955
]
ASF GitHub Bot commented on FLINK-6414:
---
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3800
@rmetzger I have tried to put this into 1.3 as well - seems to work...
> Use scala.binary.version in place of change-scala-version.sh
>
>
> Key: FLINK-6414
> URL: https://issues.apache.org/jira/browse/FLINK-6414
> Project: Flink
> Issue Type: Improvement
> Components: Build System
>Affects Versions: 1.3.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Recent commits have failed to modify {{change-scala-version.sh}} resulting in
> broken builds for {{scala-2.11}}. It looks like we can remove the need for
> this script by replacing hard-coded references to the Scala version with
> Flink's maven variable {{scala.binary.version}}.
> I had initially realized that the change script is [only used for
> building|https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#scala-versions]
> and not for switching the IDE environment.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3800
@rmetzger I have tried to put this into 1.3 as well - seems to work...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen updated FLINK-6541:
Priority: Critical (was: Major)
> Jar upload directory not created
>
>
> Key: FLINK-6541
> URL: https://issues.apache.org/jira/browse/FLINK-6541
> Project: Flink
> Issue Type: Bug
> Components: Webfrontend
>Affects Versions: 1.2.0
>Reporter: Andrey
>Priority: Critical
>
> Steps to reproduce:
> * setup configuration property: jobmanager.web.tmpdir = /mnt/flink/web
> * this directory should not exist
> * Run flink job manager.
> * in logs:
> {code}
> 2017-05-11 12:07:58,397 ERROR
> org.apache.flink.runtime.webmonitor.WebMonitorUtils - WebServer
> could not be created [main]
> java.io.IOException: Jar upload directory
> /mnt/flink/web/flink-web-3f2733c3-6f4c-4311-b617-1e93d9535421 cannot be
> created or is not writable.
> {code}
> Expected:
> * create parent directories if they do not exit. i.e. use
> "uploadDir.mkdirs()" instead of "uploadDir.mkdir()"
> Note:
> * BlobServer create parent directories (See BlobUtils storageDir.mkdirs())
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006952#comment-16006952
]
Stephan Ewen commented on FLINK-6541:
-
Good finds!
Do you want to provide a pull request for these?
> Jar upload directory not created
>
>
> Key: FLINK-6541
> URL: https://issues.apache.org/jira/browse/FLINK-6541
> Project: Flink
> Issue Type: Bug
> Components: Webfrontend
>Affects Versions: 1.2.0
>Reporter: Andrey
>
> Steps to reproduce:
> * setup configuration property: jobmanager.web.tmpdir = /mnt/flink/web
> * this directory should not exist
> * Run flink job manager.
> * in logs:
> {code}
> 2017-05-11 12:07:58,397 ERROR
> org.apache.flink.runtime.webmonitor.WebMonitorUtils - WebServer
> could not be created [main]
> java.io.IOException: Jar upload directory
> /mnt/flink/web/flink-web-3f2733c3-6f4c-4311-b617-1e93d9535421 cannot be
> created or is not writable.
> {code}
> Expected:
> * create parent directories if they do not exit. i.e. use
> "uploadDir.mkdirs()" instead of "uploadDir.mkdir()"
> Note:
> * BlobServer create parent directories (See BlobUtils storageDir.mkdirs())
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006942#comment-16006942
]
Stephan Ewen commented on FLINK-6554:
-
+1, incompatibility is always possible and should lead to a graceful failure
with a good error message
> CompatibilityResult should contain a notCompatible() option
> ---
>
> Key: FLINK-6554
> URL: https://issues.apache.org/jira/browse/FLINK-6554
> Project: Flink
> Issue Type: Improvement
> Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Priority: Minor
>
> The {{CompatibilityResult}} allows a {{TypeSerializer}} to specify whether it
> is compatible based on the given {{TypeSerializerConfigSnapshot}}.
> As it stands the only options are {{compatible}} and {{requiresMigration}}.
> We should allow serializers to also notify the system of an incompatibility
> which should then fail the job.
> This would for example be required when a serializer provides an upgrade path
> version1 -> version2 -> version3, but not directly from version1 -> version3.
> Currently, the serializer would either have to contain logic to upgrade from
> every single previous version or simply throw an exception on it's own.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006947#comment-16006947
]
Stephan Ewen commented on FLINK-6553:
-
Is this simply a missing {{@SafeVarargs}} or more?
> Calls to getSideOutput() result in warnings regarding unchecked assignments
> ---
>
> Key: FLINK-6553
> URL: https://issues.apache.org/jira/browse/FLINK-6553
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
>Reporter: Chesnay Schepler
>
> Consider the following code:
> {code}
> operator = ...
> operator.getSideOutput(tag).map(MapFunction...)
> {code}
> When writing this you will get an unchecked assignment warning. You also
> can't let the IDE auto-generate the signature of the MapFunction, it will be
> typed to objects.
> It is also odd that the following actually compiles, but that may be a java
> issue:
> {code}
> operator.getSideOutput(new OutputTag(){})
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen updated FLINK-6560:
Priority: Critical (was: Minor)
> Restore maven parallelism in flink-tests
>
>
> Key: FLINK-6560
> URL: https://issues.apache.org/jira/browse/FLINK-6560
> Project: Flink
> Issue Type: Bug
> Components: Build System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Critical
> Fix For: 1.3.0, 1.4.0
>
>
> FLINK-6506 added the maven variable {{flink.forkCountTestPackage}} which is
> used by the TravisCI script but no default value is set.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006939#comment-16006939
]
Stephan Ewen commented on FLINK-6560:
-
+1, I also see longer build time locally
> Restore maven parallelism in flink-tests
>
>
> Key: FLINK-6560
> URL: https://issues.apache.org/jira/browse/FLINK-6560
> Project: Flink
> Issue Type: Bug
> Components: Build System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Critical
> Fix For: 1.3.0, 1.4.0
>
>
> FLINK-6506 added the maven variable {{flink.forkCountTestPackage}} which is
> used by the TravisCI script but no default value is set.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen reassigned FLINK-6514:
---
Assignee: Stephan Ewen
> Cannot start Flink Cluster in standalone mode
> -
>
> Key: FLINK-6514
> URL: https://issues.apache.org/jira/browse/FLINK-6514
> Project: Flink
> Issue Type: Bug
> Components: Build System, Cluster Management
>Reporter: Aljoscha Krettek
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.3.0
>
>
> The changes introduced for FLINK-5998 change what goes into the
> {{flink-dost}} fat jar. As it is, this means that trying to start a cluster
> results in a {{ClassNotFoundException}} of {{LogFactory}} in
> {{commons-logging}}.
> One solution is to now make the shaded Hadoop jar a proper fat-jar, so that
> we again have all the dependencies.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/3873#discussion_r116062457
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/concurrent/FutureUtils.java
---
@@ -163,26 +165,31 @@ public static ConjunctFuture combineAll(Collection> futures)
* Implementation notice: The member fields all have package-private
access, because they are
* either accessed by an inner subclass or by the enclosing class.
*/
- private static class ConjunctFutureImpl extends
FlinkCompletableFuture implements ConjunctFuture {
+ private static class ConjunctFutureImpl extends
FlinkCompletableFuture implements ConjunctFuture {
/** The total number of futures in the conjunction */
final int numTotal;
/** The number of futures in the conjunction that are already
complete */
final AtomicInteger numCompleted = new AtomicInteger();
+ final ArrayList results;
+
/** The function that is attached to all futures in the
conjunction. Once a future
* is complete, this function tracks the completion or fails
the conjunct.
*/
- final BiFunction
[
https://issues.apache.org/jira/browse/FLINK-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006916#comment-16006916
]
ASF GitHub Bot commented on FLINK-6555:
---
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/3873#discussion_r116062457
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/concurrent/FutureUtils.java
---
@@ -163,26 +165,31 @@ public static ConjunctFuture combineAll(Collection> futures)
* Implementation notice: The member fields all have package-private
access, because they are
* either accessed by an inner subclass or by the enclosing class.
*/
- private static class ConjunctFutureImpl extends
FlinkCompletableFuture implements ConjunctFuture {
+ private static class ConjunctFutureImpl extends
FlinkCompletableFuture implements ConjunctFuture {
/** The total number of futures in the conjunction */
final int numTotal;
/** The number of futures in the conjunction that are already
complete */
final AtomicInteger numCompleted = new AtomicInteger();
+ final ArrayList results;
+
/** The function that is attached to all futures in the
conjunction. Once a future
* is complete, this function tracks the completion or fails
the conjunct.
*/
- final BiFunction completionHandler =
new BiFunction() {
+ final BiFunction completionHandler = new
BiFunction() {
@Override
- public Void apply(Object o, Throwable throwable) {
+ public Void apply(T o, Throwable throwable) {
if (throwable != null) {
completeExceptionally(throwable);
- }
- else if (numTotal ==
numCompleted.incrementAndGet()) {
- complete(null);
+ } else {
+ results.add(o);
--- End diff --
Is this thread safe? My assumption is that many of the completion handlers
can be called at the same time.
> Generalize ConjunctFuture
> -
>
> Key: FLINK-6555
> URL: https://issues.apache.org/jira/browse/FLINK-6555
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination
>Affects Versions: 1.4.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Trivial
> Fix For: 1.4.0
>
>
> The {{ConjunctFuture}} allows to combine multiple {{Futures}} into one. At
> the moment it does not return the collection of results of the individuals
> futures. In some cases this information is helpful and should, thus, be
> returned.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006903#comment-16006903
]
Stephan Ewen commented on FLINK-6514:
-
There are more issues. The way this is set up right now also drops transitive
dependencies from Flink that should not be dropped (compression, parts of Avro
support, ...)
I have a more comprehensive fix for this...
> Cannot start Flink Cluster in standalone mode
> -
>
> Key: FLINK-6514
> URL: https://issues.apache.org/jira/browse/FLINK-6514
> Project: Flink
> Issue Type: Bug
> Components: Build System, Cluster Management
>Reporter: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.3.0
>
>
> The changes introduced for FLINK-5998 change what goes into the
> {{flink-dost}} fat jar. As it is, this means that trying to start a cluster
> results in a {{ClassNotFoundException}} of {{LogFactory}} in
> {{commons-logging}}.
> One solution is to now make the shaded Hadoop jar a proper fat-jar, so that
> we again have all the dependencies.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Greg Hogan created FLINK-6560:
-
Summary: Restore maven parallelism in flink-tests
Key: FLINK-6560
URL: https://issues.apache.org/jira/browse/FLINK-6560
Project: Flink
Issue Type: Bug
Components: Build System
Affects Versions: 1.3.0, 1.4.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor
Fix For: 1.3.0, 1.4.0
FLINK-6506 added the maven variable {{flink.forkCountTestPackage}} which is
used by the TravisCI script but no default value is set.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Hogan closed FLINK-6393.
-
Resolution: Implemented
Fix Version/s: 1.4.0
1.3.0
master: 3ee8c69aa4390a8d51b33f262f719fb1a5474d51
release-1.3: cfaecda3ac9a736213f8bfe643b8f57ce492e243
> Add Evenly Graph Generator to Flink Gelly
> -
>
> Key: FLINK-6393
> URL: https://issues.apache.org/jira/browse/FLINK-6393
> Project: Flink
> Issue Type: New Feature
> Components: Gelly
>Reporter: FlorianFan
>Assignee: FlorianFan
>Priority: Minor
> Fix For: 1.3.0, 1.4.0
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Evenly graph means every vertex in the graph has the same degree, so the
> graph can be treated as evenly due to all the edges in the graph are
> distributed evenly. when vertex degree is 0, an empty graph will be
> generated. when vertex degree is vertex count - 1, complete graph will be
> generated.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Stephan Ewen created FLINK-6559:
---
Summary: Rename 'slaves' to 'workers'
Key: FLINK-6559
URL: https://issues.apache.org/jira/browse/FLINK-6559
Project: Flink
Issue Type: Improvement
Components: Startup Shell Scripts
Reporter: Stephan Ewen
Priority: Minor
It just feels a little nicer to call the "workers"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006810#comment-16006810
]
ASF GitHub Bot commented on FLINK-6558:
---
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3874
Looks good to me, +1 to merge this...
> Yarn tests fail on Windows
> --
>
> Key: FLINK-6558
> URL: https://issues.apache.org/jira/browse/FLINK-6558
> Project: Flink
> Issue Type: Improvement
> Components: Tests, YARN
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> The following yarn tests fail on Windows since they try to start-up an HDFS
> cluster. This requires the Windows hadoop extensions which we can't ship, so
> these tests should be disabled on Windows.
> {code}
> YarnIntraNonHaMasterServicesTest
> YarnPreConfiguredMasterHaServicesTest
> YarnApplicationMasterRunnerTest
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3874
Looks good to me, +1 to merge this...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3871
+1, looks good
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006685#comment-16006685
]
Kostas Kloudas edited comment on FLINK-5753 at 5/11/17 4:59 PM:
Hi [~jurijuri]! Thanks for reporting this!
I assume that your job is operating in {{processingTime}} right?
In this case, this is (unfortunately) expected to happen as in processing time,
it is events only that trigger computations and not timers.
The reason is that contrary to event time, where watermarks define a coarser
time granularity, on event time there is not such thing, so we use event
arrival to regulate the
frequency processing happens. If it were to register timers in processing time
that will trigger computation when they fire, then:
1) we would have to introduce another parameter (sth like timer interval) that
would imitate the watermark
2) we would have to register a timer for each incoming element (which can lead
to a
significant increase in storage requirements)
Currently we also do it for event time timers, but for event time we have ideas
on how to change it for upcoming releases.
Is this a blocker for your usecase, or you just worked in processing time for
debugging and your actual usecase uses event time?
was (Author: kkl0u):
Hi [~jurijuri]! Thanks for reporting this!
I assume that your job is operating in {{processingTime}} right?
In this case, this is (unfortunately) expected to happen as in processing time,
it is events only that trigger computations and not timers.
The reason is that contrary to event time, where watermarks define a coarser
time granularity, on event time there is not such thing, so we use event
arrival to regulate the
frequency processing happens. If it were to register timers in processing time
that will trigger computation when they fire, then:
1) we would have to introduce another parameter (sth like timer interval) that
would imitate the watermark
2) we would have to register a timer for each incoming element (which can lead
to a
significant increase in storage requirements)
Currently we also do it for event time timers, but for event time we have ideas
on how to change it for upcoming releases.
Is this a blocker for your usecase, or you just worked in processing time for
debugging and your usecase uses processing time?
> CEP timeout handler.
>
>
> Key: FLINK-5753
> URL: https://issues.apache.org/jira/browse/FLINK-5753
> Project: Flink
> Issue Type: Bug
> Components: CEP
>Affects Versions: 1.1.2
>Reporter: MichaĆ Jurkiewicz
>
> I configured the following flink job in my environment:
> {code}
> Pattern patternCommandStarted = Pattern.
> begin("event-accepted").subtype(Event.class)
> .where(e -> {event accepted where
> statement}).next("second-event-started").subtype(Event.class)
> .where(e -> {event started where statement}))
> .within(Time.seconds(30));
> DataStream> events = CEP
> .pattern(eventsStream.keyBy(e -> e.getEventProperties().get("deviceCode")),
> patternCommandStarted)
> .select(eventSelector, eventSelector);
> static class EventSelector implements PatternSelectFunction,
> PatternTimeoutFunction {}
> {code}
> The problem that I have is related to timeout handling. I observed that:
> if: first event appears, second event not appear in the stream
> and *no new events appear in a stream*, timeout handler is not executed.
> Expected result: timeout handler should be executed in case if there are no
> new events in a stream
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user asfgit closed the pull request at:
https://github.com/apache/flink/pull/3802
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user asfgit closed the pull request at:
https://github.com/apache/flink/pull/3866
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen resolved FLINK-6501.
-
Resolution: Fixed
Fixed in
- 1.3.0 via 95fd2d371eae43be1ea2534efe653af1a7095d0b
- 1.4.0 via 609bfa1db66e98e82792b6140748f14d10b79209
> Make sure NOTICE files are bundled into shaded JAR files
>
>
> Key: FLINK-6501
> URL: https://issues.apache.org/jira/browse/FLINK-6501
> Project: Flink
> Issue Type: Bug
> Components: Build System
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.3.0, 1.4.0
>
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen closed FLINK-6515.
---
> KafkaConsumer checkpointing fails because of ClassLoader issues
> ---
>
> Key: FLINK-6515
> URL: https://issues.apache.org/jira/browse/FLINK-6515
> Project: Flink
> Issue Type: Bug
> Components: Kafka Connector
>Affects Versions: 1.3.0
>Reporter: Aljoscha Krettek
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.3.0, 1.4.0
>
>
> A job with Kafka and checkpointing enabled fails with:
> {code}
> java.lang.Exception: Error while triggering checkpoint 1 for Source: Custom
> Source -> Map -> Sink: Unnamed (1/1)
> at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1195)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Could not perform checkpoint 1 for operator
> Source: Custom Source -> Map -> Sink: Unnamed (1/1).
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:526)
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.triggerCheckpoint(SourceStreamTask.java:112)
> at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1184)
> ... 5 more
> Caused by: java.lang.Exception: Could not complete snapshot 1 for operator
> Source: Custom Source -> Map -> Sink: Unnamed (1/1).
> at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:407)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1157)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1089)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:653)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:589)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:520)
> ... 7 more
> Caused by: java.lang.RuntimeException: Could not copy instance of
> (KafkaTopicPartition{topic='test-input', partition=0},-1).
> at
> org.apache.flink.runtime.state.JavaSerializer.copy(JavaSerializer.java:54)
> at
> org.apache.flink.runtime.state.JavaSerializer.copy(JavaSerializer.java:32)
> at
> org.apache.flink.runtime.state.ArrayListSerializer.copy(ArrayListSerializer.java:71)
> at
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$PartitionableListState.(DefaultOperatorStateBackend.java:368)
> at
> org.apache.flink.runtime.state.DefaultOperatorStateBackend$PartitionableListState.deepCopy(DefaultOperatorStateBackend.java:380)
> at
> org.apache.flink.runtime.state.DefaultOperatorStateBackend.snapshot(DefaultOperatorStateBackend.java:191)
> at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:392)
> ... 12 more
> Caused by: java.lang.ClassNotFoundException:
> org.apache.flink.streaming.connectors.kafka.internals.KafkaTopicPartition
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:64)
> at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
> at
>
[
https://issues.apache.org/jira/browse/FLINK-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen closed FLINK-6508.
---
> Include license files of packaged dependencies
> --
>
> Key: FLINK-6508
> URL: https://issues.apache.org/jira/browse/FLINK-6508
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
>Affects Versions: 1.3.0, 1.2.1, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>Priority: Blocker
> Fix For: 1.3.0, 1.4.0
>
>
> The Maven artifact for flink-table bundles its (non-Flink) dependencies to
> have a self-contained JAR file that can be moved to the ./lib folder without
> adding additional dependencies.
> Currently, we include Apache Calcite, Guava (relocates and required by
> Calcite), Janino, and Reflections.
> Janino and Reflections are not under Apache license, so we need to include
> their license files into the JAR file.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen resolved FLINK-6508.
-
Resolution: Fixed
Fix Version/s: 1.4.0
Fixed in
- 1.3.0 via 1b2173712dea5a5b95633af82623bd45785965ce
- 1.4.0 via 2ff5931982111f37dd51895b7110c6074cb53276
> Include license files of packaged dependencies
> --
>
> Key: FLINK-6508
> URL: https://issues.apache.org/jira/browse/FLINK-6508
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
>Affects Versions: 1.3.0, 1.2.1, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>Priority: Blocker
> Fix For: 1.3.0, 1.4.0
>
>
> The Maven artifact for flink-table bundles its (non-Flink) dependencies to
> have a self-contained JAR file that can be moved to the ./lib folder without
> adding additional dependencies.
> Currently, we include Apache Calcite, Guava (relocates and required by
> Calcite), Janino, and Reflections.
> Janino and Reflections are not under Apache license, so we need to include
> their license files into the JAR file.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006719#comment-16006719
]
Stephan Ewen commented on FLINK-5679:
-
{{PartitionedStateCheckpointingITCase}} refactored in
- 1.3 via 52d069504b0360fd8a4f0977bbb0c4bc84fd8c01
- 1.4 via 72aa2622299ea492c74c4ab0d3e00f0d323df4a9
> Refactor *CheckpointedITCase tests to speed up
> ---
>
> Key: FLINK-5679
> URL: https://issues.apache.org/jira/browse/FLINK-5679
> Project: Flink
> Issue Type: Test
> Components: Tests
>Reporter: Andrew Efimov
>Assignee: Andrew Efimov
> Labels: test-framework
> Fix For: 1.3.0
>
>
> Tests refactoring to speed up:
> {noformat}
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 40.193 sec -
> in org.apache.flink.test.checkpointing.StreamCheckpointingITCasee
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 119.063 sec -
> in org.apache.flink.test.checkpointing.UdfStreamOperatorCheckpointingITCase
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 47.525 sec -
> in org.apache.flink.test.checkpointing.PartitionedStateCheckpointingITCase
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 40.355 sec -
> in org.apache.flink.test.checkpointing.EventTimeAllWindowCheckpointingITCase
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 51.615 sec -
> in org.apache.flink.test.checkpointing.StateCheckpointedITCase
> {noformat}
> Tests could be adjusted in a similar way to save some time (some may actually
> even be redundant by now)
> https://github.com/StephanEwen/incubator-flink/commit/0dd7ae693f30585283d334a1d65b3d8222b7ca5c
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen closed FLINK-6531.
---
> Deserialize checkpoint hooks with user classloader
> --
>
> Key: FLINK-6531
> URL: https://issues.apache.org/jira/browse/FLINK-6531
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
>Reporter: Eron Wright
>Assignee: Eron Wright
>Priority: Blocker
> Fix For: 1.3.0, 1.4.0
>
>
> The checkpoint hooks introduced in FLINK-6390 aren't being deserialized with
> the user classloader, breaking remote execution.
> Remote execution produces a `ClassNotFoundException` as the job graph is
> transferred from the client to the JobManager.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen resolved FLINK-6531.
-
Resolution: Fixed
Fix Version/s: 1.4.0
Fixed in
- 1.3.0 via 11a7f466ea14478970723704d4477afea41b4e43
- 1.4.0 via aa8a90a588b4d72fc585731bea233495f0690364
> Deserialize checkpoint hooks with user classloader
> --
>
> Key: FLINK-6531
> URL: https://issues.apache.org/jira/browse/FLINK-6531
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
>Reporter: Eron Wright
>Assignee: Eron Wright
>Priority: Blocker
> Fix For: 1.3.0, 1.4.0
>
>
> The checkpoint hooks introduced in FLINK-6390 aren't being deserialized with
> the user classloader, breaking remote execution.
> Remote execution produces a `ClassNotFoundException` as the job graph is
> transferred from the client to the JobManager.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006709#comment-16006709
]
ASF GitHub Bot commented on FLINK-6414:
---
Github user asfgit closed the pull request at:
https://github.com/apache/flink/pull/3800
> Use scala.binary.version in place of change-scala-version.sh
>
>
> Key: FLINK-6414
> URL: https://issues.apache.org/jira/browse/FLINK-6414
> Project: Flink
> Issue Type: Improvement
> Components: Build System
>Affects Versions: 1.3.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Recent commits have failed to modify {{change-scala-version.sh}} resulting in
> broken builds for {{scala-2.11}}. It looks like we can remove the need for
> this script by replacing hard-coded references to the Scala version with
> Flink's maven variable {{scala.binary.version}}.
> I had initially realized that the change script is [only used for
> building|https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#scala-versions]
> and not for switching the IDE environment.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006708#comment-16006708
]
ASF GitHub Bot commented on FLINK-6532:
---
Github user asfgit closed the pull request at:
https://github.com/apache/flink/pull/3868
> Mesos version check
> ---
>
> Key: FLINK-6532
> URL: https://issues.apache.org/jira/browse/FLINK-6532
> Project: Flink
> Issue Type: Improvement
> Components: Mesos
>Reporter: Eron Wright
>
> The minimum requirement for the Mesos subsystem of Flink is 1.0. We should
> enforce the requirement with a version check upon connection. This may be
> accomplished by checking the 'version' property of the 'MesosInfo' structure.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Github user asfgit closed the pull request at:
https://github.com/apache/flink/pull/3868
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user asfgit closed the pull request at:
https://github.com/apache/flink/pull/3800
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006685#comment-16006685
]
Kostas Kloudas commented on FLINK-5753:
---
Hi [~jurijuri]! Thanks for reporting this!
I assume that your job is operating in {{processingTime}} right?
In this case, this is (unfortunately) expected to happen as in processing time,
it is events only that trigger computations and not timers.
The reason is that contrary to event time, where watermarks define a coarser
time granularity, on event time there is not such thing, so we use event
arrival to regulate the
frequency processing happens. If it were to register timers in processing time
that will trigger computation when they fire, then:
1) we would have to introduce another parameter (sth like timer interval) that
would imitate the watermark
2) we would have to register a timer for each incoming element (which can lead
to a
significant increase in storage requirements)
Currently we also do it for event time timers, but for event time we have ideas
on how to change it for upcoming releases.
Is this a blocker for your usecase, or you just worked in processing time for
debugging and your usecase uses processing time?
> CEP timeout handler.
>
>
> Key: FLINK-5753
> URL: https://issues.apache.org/jira/browse/FLINK-5753
> Project: Flink
> Issue Type: Bug
> Components: CEP
>Affects Versions: 1.1.2
>Reporter: MichaĆ Jurkiewicz
>
> I configured the following flink job in my environment:
> {code}
> Pattern patternCommandStarted = Pattern.
> begin("event-accepted").subtype(Event.class)
> .where(e -> {event accepted where
> statement}).next("second-event-started").subtype(Event.class)
> .where(e -> {event started where statement}))
> .within(Time.seconds(30));
> DataStream> events = CEP
> .pattern(eventsStream.keyBy(e -> e.getEventProperties().get("deviceCode")),
> patternCommandStarted)
> .select(eventSelector, eventSelector);
> static class EventSelector implements PatternSelectFunction,
> PatternTimeoutFunction {}
> {code}
> The problem that I have is related to timeout handling. I observed that:
> if: first event appears, second event not appear in the stream
> and *no new events appear in a stream*, timeout handler is not executed.
> Expected result: timeout handler should be executed in case if there are no
> new events in a stream
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006684#comment-16006684
]
Aljoscha Krettek commented on FLINK-6116:
-
Got it! The problem is here
https://github.com/apache/flink/blob/6181302f1ab741b86af357e4513f5952a5fc1531/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/OperatorChain.java#L264-L264
and here
https://github.com/apache/flink/blob/6181302f1ab741b86af357e4513f5952a5fc1531/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/OperatorChain.java#L118-L118.
We properly create a {{RecordWriterOutput}} for each outgoing edge in the first
snipped but the two {{StreamEdges}} that we have are the same (regarding equals
and hashCode) and so we only have one entry in the {{streamOutputMap}} and
forget about the other {{RecordWriterOutput}}. In the second snipped, where we
wire up the outputs we wire the same output twice, because the lookup for the
two edges is the same again.
> Watermarks don't work when unioning with same DataStream
>
>
> Key: FLINK-6116
> URL: https://issues.apache.org/jira/browse/FLINK-6116
> Project: Flink
> Issue Type: Bug
> Components: DataStream API
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.3.0
>
>
> In this example job we don't get any watermarks in the {{WatermarkObserver}}:
> {code}
> public class WatermarkTest {
> public static void main(String[] args) throws Exception {
> final StreamExecutionEnvironment env =
> StreamExecutionEnvironment.getExecutionEnvironment();
>
> env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
> env.getConfig().setAutoWatermarkInterval(1000);
> env.setParallelism(1);
> DataStreamSource input = env.addSource(new
> SourceFunction() {
> @Override
> public void run(SourceContext ctx) throws
> Exception {
> while (true) {
> ctx.collect("hello!");
> Thread.sleep(800);
> }
> }
> @Override
> public void cancel() {
> }
> });
> input.union(input)
> .flatMap(new IdentityFlatMap())
> .transform("WatermarkOp",
> BasicTypeInfo.STRING_TYPE_INFO, new WatermarkObserver());
> env.execute();
> }
> public static class WatermarkObserver
> extends AbstractStreamOperator
> implements OneInputStreamOperator {
> @Override
> public void processElement(StreamRecord element) throws
> Exception {
> System.out.println("GOT ELEMENT: " + element);
> }
> @Override
> public void processWatermark(Watermark mark) throws Exception {
> super.processWatermark(mark);
> System.out.println("GOT WATERMARK: " + mark);
> }
> }
> private static class IdentityFlatMap
> extends RichFlatMapFunction {
> @Override
> public void flatMap(String value, Collector out) throws
> Exception {
> out.collect(value);
> }
> }
> }
> {code}
> When commenting out the `union` it works.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006678#comment-16006678
]
ASF GitHub Bot commented on FLINK-4022:
---
Github user aljoscha commented on a diff in the pull request:
https://github.com/apache/flink/pull/3746#discussion_r116000971
--- Diff:
flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java
---
@@ -503,23 +644,30 @@ public void close() throws Exception {
public void initializeState(FunctionInitializationContext context)
throws Exception {
OperatorStateStore stateStore = context.getOperatorStateStore();
- offsetsStateForCheckpoint =
stateStore.getSerializableListState(DefaultOperatorStateBackend.DEFAULT_OPERATOR_STATE_NAME);
- if (context.isRestored()) {
- if (restoredState == null) {
- restoredState = new HashMap<>();
- for (Tuple2
kafkaOffset : offsetsStateForCheckpoint.get()) {
- restoredState.put(kafkaOffset.f0,
kafkaOffset.f1);
- }
+ ListState>
oldRoundRobinListState =
+
stateStore.getSerializableListState(DefaultOperatorStateBackend.DEFAULT_OPERATOR_STATE_NAME);
- LOG.info("Setting restore state in the
FlinkKafkaConsumer.");
- if (LOG.isDebugEnabled()) {
- LOG.debug("Using the following offsets:
{}", restoredState);
- }
+ this.unionOffsetStates = stateStore.getUnionListState(new
ListStateDescriptor<>(
+ OFFSETS_STATE_NAME,
+ TypeInformation.of(new
TypeHint>() {})));
+
+ if (context.isRestored() && !restoredFromOldState) {
+ restoredState = new TreeMap<>(new
KafkaTopicPartition.Comparator());
+
+ // migrate from 1.2 state, if there is any
+ for (Tuple2 kafkaOffset :
oldRoundRobinListState.get()) {
+ restoredFromOldState = true;
--- End diff --
Could it be that we restore from an old 1.2 snapshot and don't get anything
here because we simply weren't assigned any state. (For example because the
parallelism is higher than before.)
> Partition discovery / regex topic subscription for the Kafka consumer
> -
>
> Key: FLINK-4022
> URL: https://issues.apache.org/jira/browse/FLINK-4022
> Project: Flink
> Issue Type: New Feature
> Components: Kafka Connector, Streaming Connectors
>Affects Versions: 1.0.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Critical
> Fix For: 1.3.0
>
>
> Example: allow users to subscribe to "topic-n*", so that the consumer
> automatically reads from "topic-n1", "topic-n2", ... and so on as they are
> added to Kafka.
> I propose to implement this feature by the following description:
> Since the overall list of partitions to read will change after job
> submission, the main big change required for this feature will be dynamic
> partition assignment to subtasks while the Kafka consumer is running. This
> will mainly be accomplished using Kafka 0.9.x API
> `KafkaConsumer#subscribe(java.util.regex.Pattern,
> ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be
> added to the same consumer group when instantiated, and rely on Kafka to
> dynamically reassign partitions to them whenever a rebalance happens. The
> registered `ConsumerRebalanceListener` is a callback that is called right
> before and after rebalancing happens. We'll use this callback to let each
> subtask commit its last offsets of partitions its currently responsible of to
> an external store (or Kafka) before a rebalance; after rebalance and the
> substasks gets the new partitions it'll be reading from, they'll read from
> the external store to get the last offsets for their new partitions
> (partitions which don't have offset entries in the store are new partitions
> causing the rebalancing).
> The tricky part will be restoring Flink checkpoints when the partition
> assignment is dynamic. Snapshotting will remain the same - subtasks snapshot
> the offsets of partitions they are currently holding. Restoring will be a
> bit different in that subtasks might not be assigned matching partitions to
> the snapshot the subtask is restored with
[
https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006677#comment-16006677
]
ASF GitHub Bot commented on FLINK-4022:
---
Github user aljoscha commented on a diff in the pull request:
https://github.com/apache/flink/pull/3746#discussion_r115996970
--- Diff:
flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java
---
@@ -106,31 +139,69 @@ protected AbstractFetcher(
}
}
- // create our partition state according to the
timestamp/watermark mode
- this.subscribedPartitionStates =
initializeSubscribedPartitionStates(
- assignedPartitionsWithInitialOffsets,
+ this.unassignedPartitionsQueue = new ClosableBlockingQueue<>();
+
+ // initialize subscribed partition states with seed partitions
+ this.subscribedPartitionStates = createPartitionStateHolders(
+ seedPartitionsWithInitialOffsets,
timestampWatermarkMode,
- watermarksPeriodic, watermarksPunctuated,
+ watermarksPeriodic,
+ watermarksPunctuated,
userCodeClassLoader);
- // check that all partition states have a defined offset
+ // check that all seed partition states have a defined offset
for (KafkaTopicPartitionState partitionState :
subscribedPartitionStates) {
if (!partitionState.isOffsetDefined()) {
- throw new IllegalArgumentException("The fetcher
was assigned partitions with undefined initial offsets.");
+ throw new IllegalArgumentException("The fetcher
was assigned seed partitions with undefined initial offsets.");
}
}
-
+
+ // all seed partitions are not assigned yet, so should be added
to the unassigned partitions queue
+ for (KafkaTopicPartitionState partition :
subscribedPartitionStates) {
+ unassignedPartitionsQueue.add(partition);
+ }
+
// if we have periodic watermarks, kick off the interval
scheduler
if (timestampWatermarkMode == PERIODIC_WATERMARKS) {
- KafkaTopicPartitionStateWithPeriodicWatermarks[]
parts =
-
(KafkaTopicPartitionStateWithPeriodicWatermarks[])
subscribedPartitionStates;
-
- PeriodicWatermarkEmitter periodicEmitter =
- new PeriodicWatermarkEmitter(parts,
sourceContext, processingTimeProvider, autoWatermarkInterval);
+ @SuppressWarnings("unchecked")
+ PeriodicWatermarkEmitter periodicEmitter = new
PeriodicWatermarkEmitter(
--- End diff --
Could add the generic parameter `KPH` here.
> Partition discovery / regex topic subscription for the Kafka consumer
> -
>
> Key: FLINK-4022
> URL: https://issues.apache.org/jira/browse/FLINK-4022
> Project: Flink
> Issue Type: New Feature
> Components: Kafka Connector, Streaming Connectors
>Affects Versions: 1.0.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Critical
> Fix For: 1.3.0
>
>
> Example: allow users to subscribe to "topic-n*", so that the consumer
> automatically reads from "topic-n1", "topic-n2", ... and so on as they are
> added to Kafka.
> I propose to implement this feature by the following description:
> Since the overall list of partitions to read will change after job
> submission, the main big change required for this feature will be dynamic
> partition assignment to subtasks while the Kafka consumer is running. This
> will mainly be accomplished using Kafka 0.9.x API
> `KafkaConsumer#subscribe(java.util.regex.Pattern,
> ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be
> added to the same consumer group when instantiated, and rely on Kafka to
> dynamically reassign partitions to them whenever a rebalance happens. The
> registered `ConsumerRebalanceListener` is a callback that is called right
> before and after rebalancing happens. We'll use this callback to let each
> subtask commit its last offsets of partitions its currently responsible of to
> an external store (or Kafka) before a rebalance; after rebalance and the
> substasks gets the new partitions it'll be reading from, they'll read from
>
[
https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006679#comment-16006679
]
ASF GitHub Bot commented on FLINK-4022:
---
Github user aljoscha commented on a diff in the pull request:
https://github.com/apache/flink/pull/3746#discussion_r115997026
--- Diff:
flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java
---
@@ -106,31 +139,69 @@ protected AbstractFetcher(
}
}
- // create our partition state according to the
timestamp/watermark mode
- this.subscribedPartitionStates =
initializeSubscribedPartitionStates(
- assignedPartitionsWithInitialOffsets,
+ this.unassignedPartitionsQueue = new ClosableBlockingQueue<>();
+
+ // initialize subscribed partition states with seed partitions
+ this.subscribedPartitionStates = createPartitionStateHolders(
+ seedPartitionsWithInitialOffsets,
timestampWatermarkMode,
- watermarksPeriodic, watermarksPunctuated,
+ watermarksPeriodic,
+ watermarksPunctuated,
userCodeClassLoader);
- // check that all partition states have a defined offset
+ // check that all seed partition states have a defined offset
for (KafkaTopicPartitionState partitionState :
subscribedPartitionStates) {
--- End diff --
Could add generic parameter, but is already existing code.
> Partition discovery / regex topic subscription for the Kafka consumer
> -
>
> Key: FLINK-4022
> URL: https://issues.apache.org/jira/browse/FLINK-4022
> Project: Flink
> Issue Type: New Feature
> Components: Kafka Connector, Streaming Connectors
>Affects Versions: 1.0.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Critical
> Fix For: 1.3.0
>
>
> Example: allow users to subscribe to "topic-n*", so that the consumer
> automatically reads from "topic-n1", "topic-n2", ... and so on as they are
> added to Kafka.
> I propose to implement this feature by the following description:
> Since the overall list of partitions to read will change after job
> submission, the main big change required for this feature will be dynamic
> partition assignment to subtasks while the Kafka consumer is running. This
> will mainly be accomplished using Kafka 0.9.x API
> `KafkaConsumer#subscribe(java.util.regex.Pattern,
> ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be
> added to the same consumer group when instantiated, and rely on Kafka to
> dynamically reassign partitions to them whenever a rebalance happens. The
> registered `ConsumerRebalanceListener` is a callback that is called right
> before and after rebalancing happens. We'll use this callback to let each
> subtask commit its last offsets of partitions its currently responsible of to
> an external store (or Kafka) before a rebalance; after rebalance and the
> substasks gets the new partitions it'll be reading from, they'll read from
> the external store to get the last offsets for their new partitions
> (partitions which don't have offset entries in the store are new partitions
> causing the rebalancing).
> The tricky part will be restoring Flink checkpoints when the partition
> assignment is dynamic. Snapshotting will remain the same - subtasks snapshot
> the offsets of partitions they are currently holding. Restoring will be a
> bit different in that subtasks might not be assigned matching partitions to
> the snapshot the subtask is restored with (since we're letting Kafka
> dynamically assign partitions). There will need to be a coordination process
> where, if a restore state exists, all subtasks first commit the offsets they
> receive (as a result of the restore state) to the external store, and then
> all subtasks attempt to find a last offset for the partitions it is holding.
> However, if the globally merged restore state feature mentioned by
> [~StephanEwen] in https://issues.apache.org/jira/browse/FLINK-3231 is
> available, then the restore will be simple again, as each subtask has full
> access to previous global state therefore coordination is not required.
> I think changing to dynamic partition assignment is also good in the long run
> for handling topic repartitioning.
> Overall,
> User-facing API changes:
> - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern,
> DeserializationSchema, Properties)
> -
Github user aljoscha commented on a diff in the pull request:
https://github.com/apache/flink/pull/3746#discussion_r115996970
--- Diff:
flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java
---
@@ -106,31 +139,69 @@ protected AbstractFetcher(
}
}
- // create our partition state according to the
timestamp/watermark mode
- this.subscribedPartitionStates =
initializeSubscribedPartitionStates(
- assignedPartitionsWithInitialOffsets,
+ this.unassignedPartitionsQueue = new ClosableBlockingQueue<>();
+
+ // initialize subscribed partition states with seed partitions
+ this.subscribedPartitionStates = createPartitionStateHolders(
+ seedPartitionsWithInitialOffsets,
timestampWatermarkMode,
- watermarksPeriodic, watermarksPunctuated,
+ watermarksPeriodic,
+ watermarksPunctuated,
userCodeClassLoader);
- // check that all partition states have a defined offset
+ // check that all seed partition states have a defined offset
for (KafkaTopicPartitionState partitionState :
subscribedPartitionStates) {
if (!partitionState.isOffsetDefined()) {
- throw new IllegalArgumentException("The fetcher
was assigned partitions with undefined initial offsets.");
+ throw new IllegalArgumentException("The fetcher
was assigned seed partitions with undefined initial offsets.");
}
}
-
+
+ // all seed partitions are not assigned yet, so should be added
to the unassigned partitions queue
+ for (KafkaTopicPartitionState partition :
subscribedPartitionStates) {
+ unassignedPartitionsQueue.add(partition);
+ }
+
// if we have periodic watermarks, kick off the interval
scheduler
if (timestampWatermarkMode == PERIODIC_WATERMARKS) {
- KafkaTopicPartitionStateWithPeriodicWatermarks[]
parts =
-
(KafkaTopicPartitionStateWithPeriodicWatermarks[])
subscribedPartitionStates;
-
- PeriodicWatermarkEmitter periodicEmitter =
- new PeriodicWatermarkEmitter(parts,
sourceContext, processingTimeProvider, autoWatermarkInterval);
+ @SuppressWarnings("unchecked")
+ PeriodicWatermarkEmitter periodicEmitter = new
PeriodicWatermarkEmitter(
--- End diff --
Could add the generic parameter `KPH` here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user aljoscha commented on a diff in the pull request:
https://github.com/apache/flink/pull/3746#discussion_r116000971
--- Diff:
flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java
---
@@ -503,23 +644,30 @@ public void close() throws Exception {
public void initializeState(FunctionInitializationContext context)
throws Exception {
OperatorStateStore stateStore = context.getOperatorStateStore();
- offsetsStateForCheckpoint =
stateStore.getSerializableListState(DefaultOperatorStateBackend.DEFAULT_OPERATOR_STATE_NAME);
- if (context.isRestored()) {
- if (restoredState == null) {
- restoredState = new HashMap<>();
- for (Tuple2
kafkaOffset : offsetsStateForCheckpoint.get()) {
- restoredState.put(kafkaOffset.f0,
kafkaOffset.f1);
- }
+ ListState>
oldRoundRobinListState =
+
stateStore.getSerializableListState(DefaultOperatorStateBackend.DEFAULT_OPERATOR_STATE_NAME);
- LOG.info("Setting restore state in the
FlinkKafkaConsumer.");
- if (LOG.isDebugEnabled()) {
- LOG.debug("Using the following offsets:
{}", restoredState);
- }
+ this.unionOffsetStates = stateStore.getUnionListState(new
ListStateDescriptor<>(
+ OFFSETS_STATE_NAME,
+ TypeInformation.of(new
TypeHint>() {})));
+
+ if (context.isRestored() && !restoredFromOldState) {
+ restoredState = new TreeMap<>(new
KafkaTopicPartition.Comparator());
+
+ // migrate from 1.2 state, if there is any
+ for (Tuple2 kafkaOffset :
oldRoundRobinListState.get()) {
+ restoredFromOldState = true;
--- End diff --
Could it be that we restore from an old 1.2 snapshot and don't get anything
here because we simply weren't assigned any state. (For example because the
parallelism is higher than before.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user aljoscha commented on a diff in the pull request:
https://github.com/apache/flink/pull/3746#discussion_r115997026
--- Diff:
flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java
---
@@ -106,31 +139,69 @@ protected AbstractFetcher(
}
}
- // create our partition state according to the
timestamp/watermark mode
- this.subscribedPartitionStates =
initializeSubscribedPartitionStates(
- assignedPartitionsWithInitialOffsets,
+ this.unassignedPartitionsQueue = new ClosableBlockingQueue<>();
+
+ // initialize subscribed partition states with seed partitions
+ this.subscribedPartitionStates = createPartitionStateHolders(
+ seedPartitionsWithInitialOffsets,
timestampWatermarkMode,
- watermarksPeriodic, watermarksPunctuated,
+ watermarksPeriodic,
+ watermarksPunctuated,
userCodeClassLoader);
- // check that all partition states have a defined offset
+ // check that all seed partition states have a defined offset
for (KafkaTopicPartitionState partitionState :
subscribedPartitionStates) {
--- End diff --
Could add generic parameter, but is already existing code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/FLINK-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006662#comment-16006662
]
ASF GitHub Bot commented on FLINK-6558:
---
GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/3874
[FLINK-6558] Disable yarn tests on Windows
This PR disabled a several YARN tests on Windows. They are failing since
they try to start an HDFS cluster.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 6558_yarn_tests
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3874.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3874
commit ed48f4ffed5bb403f29bb2cf4350d3d16d69f307
Author: zentol
Date: 2017-05-11T14:55:11Z
[FLINK-6558] Disable yarn tests on Windows
> Yarn tests fail on Windows
> --
>
> Key: FLINK-6558
> URL: https://issues.apache.org/jira/browse/FLINK-6558
> Project: Flink
> Issue Type: Improvement
> Components: Tests, YARN
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> The following yarn tests fail on Windows since they try to start-up an HDFS
> cluster. This requires the Windows hadoop extensions which we can't ship, so
> these tests should be disabled on Windows.
> {code}
> YarnIntraNonHaMasterServicesTest
> YarnPreConfiguredMasterHaServicesTest
> YarnApplicationMasterRunnerTest
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/3874
[FLINK-6558] Disable yarn tests on Windows
This PR disabled a several YARN tests on Windows. They are failing since
they try to start an HDFS cluster.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 6558_yarn_tests
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3874.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3874
commit ed48f4ffed5bb403f29bb2cf4350d3d16d69f307
Author: zentol
Date: 2017-05-11T14:55:11Z
[FLINK-6558] Disable yarn tests on Windows
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Chesnay Schepler created FLINK-6558:
---
Summary: Yarn tests fail on Windows
Key: FLINK-6558
URL: https://issues.apache.org/jira/browse/FLINK-6558
Project: Flink
Issue Type: Improvement
Components: Tests, YARN
Affects Versions: 1.3.0, 1.4.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
The following yarn tests fail on Windows since they try to start-up an HDFS
cluster. This requires the Windows hadoop extensions which we can't ship, so
these tests should be disabled on Windows.
{code}
YarnIntraNonHaMasterServicesTest
YarnPreConfiguredMasterHaServicesTest
YarnApplicationMasterRunnerTest
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Chesnay Schepler created FLINK-6557:
---
Summary: RocksDbStateBackendTest fails on Windows
Key: FLINK-6557
URL: https://issues.apache.org/jira/browse/FLINK-6557
Project: Flink
Issue Type: Improvement
Components: State Backends, Checkpointing, Tests
Affects Versions: 1.3.0, 1.4.0
Reporter: Chesnay Schepler
The {{RocksDbStateBackendTest}} fails on windows when incremental checkpoint is
enabled.
Based on the exception i guess the file name is just simply too long:
{code}
org.rocksdb.RocksDBException: IO error: Failed to create dir:
/C:/Users/Zento/AppData/Local/Temp/junit572330160893758355/junit5754599533651878867/job-ecbdb9df76fd3a39108dac7c515e3214_op-Test_uuid-6a43f1f6-1f35-443e-945c-aab3643e62fc/chk-0.tmp:
Invalid argument
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
[
https://issues.apache.org/jira/browse/FLINK-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006649#comment-16006649
]
ASF GitHub Bot commented on FLINK-6555:
---
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/3873
[FLINK-6555] [futures] Generalize ConjunctFuture to return results
The ConjunctFuture now returns the set of values of the individual futures
it is composed of once it is completed.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink generalizeConjunctFuture
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3873.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3873
commit a6fc20d9f8cda04a835459f38ed885e87f3d478b
Author: Till Rohrmann
Date: 2017-05-11T15:36:17Z
[FLINK-6555] [futures] Generalize ConjunctFuture to return results
The ConjunctFuture now returns the set of future values once it is
completed.
> Generalize ConjunctFuture
> -
>
> Key: FLINK-6555
> URL: https://issues.apache.org/jira/browse/FLINK-6555
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination
>Affects Versions: 1.4.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Trivial
> Fix For: 1.4.0
>
>
> The {{ConjunctFuture}} allows to combine multiple {{Futures}} into one. At
> the moment it does not return the collection of results of the individuals
> futures. In some cases this information is helpful and should, thus, be
> returned.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/3873
[FLINK-6555] [futures] Generalize ConjunctFuture to return results
The ConjunctFuture now returns the set of values of the individual futures
it is composed of once it is completed.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink generalizeConjunctFuture
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3873.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3873
commit a6fc20d9f8cda04a835459f38ed885e87f3d478b
Author: Till Rohrmann
Date: 2017-05-11T15:36:17Z
[FLINK-6555] [futures] Generalize ConjunctFuture to return results
The ConjunctFuture now returns the set of future values once it is
completed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---