[jira] [Closed] (FLINK-14825) Rework state processor api documentation
[ https://issues.apache.org/jira/browse/FLINK-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-14825. --- Fix Version/s: 1.9.2 Resolution: Fixed merged to * master: bd56224c3063fd23d508a4250e5698d4840fa488 * release-1.10: b11b010aaacfc6e65d5c703d22e39e642121ce38 * release-1.9: 24f5c5cd901332761d8deaa85f208f8ad2514b2b > Rework state processor api documentation > > > Key: FLINK-14825 > URL: https://issues.apache.org/jira/browse/FLINK-14825 > Project: Flink > Issue Type: Improvement > Components: API / State Processor, Documentation >Reporter: Seth Wiesman >Assignee: Seth Wiesman >Priority: Major > Labels: pull-request-available > Fix For: 1.9.2, 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The current version of the spa docs were rushed to meet the 1.9 release. We > should rewrite them to be more complete, include better examples, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-11789) Checkpoint directories are not cleaned up after job termination
[ https://issues.apache.org/jira/browse/FLINK-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008837#comment-17008837 ] Nico Kruber commented on FLINK-11789: - One more thing to consider: I was using Azure Blob Storage for the checkpoints and savepoints directories and a job with no checkpoints. Now, I took some savepoints and still have the aforementioned directories lying around (while the job is running and afterwards as elaborated above). Since a savepoint essentially/almost does the same thing as a checkpoint, I do get the part of these directories being there, but without active checkpoints, they are strictly not even necessary in the first place. However, I just wanted to point out that this also affects jobs with savepoints only. > Checkpoint directories are not cleaned up after job termination > --- > > Key: FLINK-11789 > URL: https://issues.apache.org/jira/browse/FLINK-11789 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.9.0 >Reporter: Till Rohrmann >Priority: Major > > Flink currently does not clean up all checkpoint directories when a job > reaches a globally terminal state. Having configured the checkpoint directory > {{checkpoints}}, I observe that after cancelling the job {{JOB_ID}} there are > still > {code} > checkpoints/JOB_ID/shared > checkpoints/JOB_ID/taskowned > {code} > I think it would be good if would delete {{checkpoints/JOB_ID}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15335) add-dependencies-for-IDEA not working anymore and dangerous in general
Nico Kruber created FLINK-15335: --- Summary: add-dependencies-for-IDEA not working anymore and dangerous in general Key: FLINK-15335 URL: https://issues.apache.org/jira/browse/FLINK-15335 Project: Flink Issue Type: Bug Components: Documentation, Quickstarts Affects Versions: 1.9.1, 1.8.3, 1.10.0 Reporter: Nico Kruber The quickstart's {{add-dependencies-for-IDEA}} profile (for including {{flink-runtime}} and further dependencies that are usually {{provided}}) is not automatically enabled with IntelliJ anymore since the {{idea.version}} property is not set anymore (since a couple of versions of IntelliJ). My IntelliJ, for example, sets {{idea.version2019.3}} instead but even if the profile activation is changed to that, it is not enabled by default by IntelliJ. There are two workarounds: * Tick {{Include dependencies with "Provided" scope}} in the run configuration (available in any newer IntelliJ version, probably since 2018) or * enable the profile manually - downside: if you create a jar inside IntelliJ via its own maven targets, the jar would contain the provided dependencies and make it unsuitable for submission into a Flink cluster. I propose to remove the {{add-dependencies-for-IDEA}} profile for good (from the quickstarts) and adapt the documentation accordingly, e.g. [https://ci.apache.org/projects/flink/flink-docs-stable/dev/projectsetup/dependencies.html#setting-up-a-project-basic-dependencies] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15298) Wrong dependences in the DataStream API tutorial (the wiki-edits example)
[ https://issues.apache.org/jira/browse/FLINK-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-15298: Affects Version/s: 1.7.2 1.8.3 > Wrong dependences in the DataStream API tutorial (the wiki-edits example) > - > > Key: FLINK-15298 > URL: https://issues.apache.org/jira/browse/FLINK-15298 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.7.2, 1.8.3, 1.9.0, 1.9.1 >Reporter: Jun Qin >Priority: Major > > [The DataStream API Tutorial in Flink 1.9 | > https://ci.apache.org/projects/flink/flink-docs-release-1.9/getting-started/tutorials/datastream_api.html] > mentioned the following dependences: > {code:java} > > > org.apache.flink > flink-java > ${flink.version} > > > org.apache.flink > flink-streaming-java_2.11 > ${flink.version} > > > org.apache.flink > flink-clients_2.11 > ${flink.version} > > > org.apache.flink > flink-connector-wikiedits_2.11 > ${flink.version} > > > {code} > There are two issues here: > # {{flink-java}} and {{flink-streaming-java}} should be set to *provided* > scope > # {{flink-client}} is not needed. If {{flink-client}} is added into *compile* > scope, {{flink-runtime}} will be added implicitly -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15298) Wrong dependences in the DataStream API tutorial (the wiki-edits example)
[ https://issues.apache.org/jira/browse/FLINK-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-15298: Affects Version/s: (was: 1.9.0) > Wrong dependences in the DataStream API tutorial (the wiki-edits example) > - > > Key: FLINK-15298 > URL: https://issues.apache.org/jira/browse/FLINK-15298 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.7.2, 1.8.3, 1.9.1 >Reporter: Jun Qin >Priority: Major > > [The DataStream API Tutorial in Flink 1.9 | > https://ci.apache.org/projects/flink/flink-docs-release-1.9/getting-started/tutorials/datastream_api.html] > mentioned the following dependences: > {code:java} > > > org.apache.flink > flink-java > ${flink.version} > > > org.apache.flink > flink-streaming-java_2.11 > ${flink.version} > > > org.apache.flink > flink-clients_2.11 > ${flink.version} > > > org.apache.flink > flink-connector-wikiedits_2.11 > ${flink.version} > > > {code} > There are two issues here: > # {{flink-java}} and {{flink-streaming-java}} should be set to *provided* > scope > # {{flink-client}} is not needed. If {{flink-client}} is added into *compile* > scope, {{flink-runtime}} will be added implicitly -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15010) Temp directories flink-netty-shuffle-* are not cleaned up
[ https://issues.apache.org/jira/browse/FLINK-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998148#comment-16998148 ] Nico Kruber commented on FLINK-15010: - I used {{start-cluster.sh}} and added {{localhost}} twice into {{conf/slaves}} > Temp directories flink-netty-shuffle-* are not cleaned up > - > > Key: FLINK-15010 > URL: https://issues.apache.org/jira/browse/FLINK-15010 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Affects Versions: 1.9.1 >Reporter: Nico Kruber >Priority: Major > > Starting a Flink cluster with 2 TMs and stopping it again will leave 2 > temporary directories (and not delete them): flink-netty-shuffle- -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-14942) State Processing API: add an option to make deep copy
[ https://issues.apache.org/jira/browse/FLINK-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-14942: --- Assignee: Jun Qin > State Processing API: add an option to make deep copy > - > > Key: FLINK-14942 > URL: https://issues.apache.org/jira/browse/FLINK-14942 > Project: Flink > Issue Type: Improvement > Components: API / State Processor >Affects Versions: 1.11.0 >Reporter: Jun Qin >Assignee: Jun Qin >Priority: Major > > Current when a new savepoint is created based on a source savepoint, then > there are references in the new savepoint to the source savepoint. Here is > the [State Processing API > doc|https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/libs/state_processor_api.html] > says: > bq. Note: When basing a new savepoint on existing state, the state processor > api makes a shallow copy of the pointers to the existing operators. This > means that both savepoints share state and one cannot be deleted without > corrupting the other! > This JIRA is to request an option to have a deep copy (instead of shallow > copy) such that the new savepoint is self-contained. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (FLINK-15113) fs.azure.account.key not hidden from global configuration
[ https://issues.apache.org/jira/browse/FLINK-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber resolved FLINK-15113. - Resolution: Fixed Fixed in * master: e9afee736acaaf8c74c66c52fa651d565cd48b10 * release-1.9: 1b490927d391baaef4bce7421461a6eb2bd66254 > fs.azure.account.key not hidden from global configuration > - > > Key: FLINK-15113 > URL: https://issues.apache.org/jira/browse/FLINK-15113 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.9.1 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.9.2 > > Time Spent: 20m > Remaining Estimate: 0h > > For access to Azure's Blob Storage, you need to provide the (secret) key with > {{fs.azure.account.key..core.windows.net}} > This value, however, is not hidden from the global configuration which only > specifies configurations with keys containing "password" or "secret" as > sensitive. > We should add {{fs.azure.account.key}} to that list as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15113) fs.azure.account.key not hidden from global configuration
[ https://issues.apache.org/jira/browse/FLINK-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-15113: Fix Version/s: (was: 1.8.3) > fs.azure.account.key not hidden from global configuration > - > > Key: FLINK-15113 > URL: https://issues.apache.org/jira/browse/FLINK-15113 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.9.1 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.9.2 > > Time Spent: 10m > Remaining Estimate: 0h > > For access to Azure's Blob Storage, you need to provide the (secret) key with > {{fs.azure.account.key..core.windows.net}} > This value, however, is not hidden from the global configuration which only > specifies configurations with keys containing "password" or "secret" as > sensitive. > We should add {{fs.azure.account.key}} to that list as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15113) fs.azure.account.key not hidden from global configuration
[ https://issues.apache.org/jira/browse/FLINK-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-15113: Fix Version/s: 1.9.2 1.8.3 > fs.azure.account.key not hidden from global configuration > - > > Key: FLINK-15113 > URL: https://issues.apache.org/jira/browse/FLINK-15113 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.9.1 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.8.3, 1.9.2 > > Time Spent: 10m > Remaining Estimate: 0h > > For access to Azure's Blob Storage, you need to provide the (secret) key with > {{fs.azure.account.key..core.windows.net}} > This value, however, is not hidden from the global configuration which only > specifies configurations with keys containing "password" or "secret" as > sensitive. > We should add {{fs.azure.account.key}} to that list as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15113) fs.azure.account.key not hidden from global configuration
[ https://issues.apache.org/jira/browse/FLINK-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-15113: Description: For access to Azure's Blob Storage, you need to provide the (secret) key with {{fs.azure.account.key..core.windows.net}} This value, however, is not hidden from the global configuration which only specifies configurations with keys containing "password" or "secret" as sensitive. We should add {{fs.azure.account.key}} to that list as well. was: For access to Azrue's Blob Storage, you need to provide the (secret) key with {{fs.azure.account.key..core.windows.net}} This value, however, is not hidden from the global configuration which only specifies configurations with keys containing "password" or "secret" as sensitive. We should add {{fs.azure.account.key}} to that list as well. > fs.azure.account.key not hidden from global configuration > - > > Key: FLINK-15113 > URL: https://issues.apache.org/jira/browse/FLINK-15113 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.9.1 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Fix For: 1.10.0 > > > For access to Azure's Blob Storage, you need to provide the (secret) key with > {{fs.azure.account.key..core.windows.net}} > This value, however, is not hidden from the global configuration which only > specifies configurations with keys containing "password" or "secret" as > sensitive. > We should add {{fs.azure.account.key}} to that list as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15113) fs.azure.account.key not hidden from global configuration
Nico Kruber created FLINK-15113: --- Summary: fs.azure.account.key not hidden from global configuration Key: FLINK-15113 URL: https://issues.apache.org/jira/browse/FLINK-15113 Project: Flink Issue Type: Improvement Components: Runtime / Web Frontend Affects Versions: 1.9.1 Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.10.0 For access to Azrue's Blob Storage, you need to provide the (secret) key with {{fs.azure.account.key..core.windows.net}} This value, however, is not hidden from the global configuration which only specifies configurations with keys containing "password" or "secret" as sensitive. We should add {{fs.azure.account.key}} to that list as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15068) Disable RocksDB's local LOG by default
Nico Kruber created FLINK-15068: --- Summary: Disable RocksDB's local LOG by default Key: FLINK-15068 URL: https://issues.apache.org/jira/browse/FLINK-15068 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Affects Versions: 1.9.1, 1.8.2, 1.7.2 Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.10.0 With Flink's default settings for RocksDB, it will write a log file (not the WAL, but pure logging statements) into the data folder. Besides periodic statistics, it will log compaction attempts, new memtable creations, flushes, etc. A few things to note about this practice: # *this LOG file is growing over time with no limit (!)* # the default logging level is INFO # the statistics in there may help looking into performance and/or disk space problems (but maybe you should be looking and monitoring metrics instead) # this file is not useful for debugging errors since it will be deleted along with the local dir when the TM goes down With a custom \{{OptionsFactory}}, the user can change the behaviour like the following: {code:java} @Override public DBOptions createDBOptions(DBOptions currentOptions) { currentOptions = super.createDBOptions(currentOptions); currentOptions.setKeepLogFileNum(10); currentOptions.setInfoLogLevel(InfoLogLevel.WARN_LEVEL); currentOptions.setStatsDumpPeriodSec(0); currentOptions.setMaxLogFileSize(1024 * 1024); // 1 MB each return currentOptions; }{code} However, the rotating logger does currently not work (it will not delete old log files - see [https://github.com/dataArtisans/frocksdb/pull/12]). Also, the user should not have to write his own {{OptionsFactory}} to get a sensible default. To prevent this file from filling up the disk, I propose to change Flink's default RocksDB settings so that the LOG file is effectively disabled (nothing is written to it by default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15012) Checkpoint directory not cleaned up
[ https://issues.apache.org/jira/browse/FLINK-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986760#comment-16986760 ] Nico Kruber commented on FLINK-15012: - Well, we do have a lot of temp directories that will be deleted with {{stop-cluster.sh}}, e.g. blobStorage or flink-io. However, the checkpoint directory may be special because it is shared between the JobManager and the TaskManager processes. Even if the JobManager cleans this up, some TaskManager could still be writing to it in case a checkpoint was concurrently being created. I did not try, but I am a bit concerned whether this may happen in a real cluster setup as well, for example in K8s where you may kill the Flink cluster (along with all running jobs) through K8s. Since we claim that the checkpoint lifecycle is managed by Flink, it should actually always do the cleanup* Looking at the code you linked for ZooKeeperCompletedCheckpointStore as well as how StandaloneCompletedCheckpointStore implement their {{shutdown() }}method, I am also wondering why they only clean up completed checkpoints. Shouldn't they also clean up in-process checkpoints (if possible)? * There may be some strings attached but then they would need to be documented so that DevOps may account for that and eventually do a manual cleanup (if the checkpoint path lets them identify what to delete). > Checkpoint directory not cleaned up > --- > > Key: FLINK-15012 > URL: https://issues.apache.org/jira/browse/FLINK-15012 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.9.1 >Reporter: Nico Kruber >Priority: Major > > I started a Flink cluster with 2 TMs using {{start-cluster.sh}} and the > following config (in addition to the default {{flink-conf.yaml}}) > {code:java} > state.checkpoints.dir: file:///path/to/checkpoints/ > state.backend: rocksdb {code} > After submitting a jobwith checkpoints enabled (every 5s), checkpoints show > up, e.g. > {code:java} > bb969f842bbc0ecc3b41b7fbe23b047b/ > ├── chk-2 > │ ├── 238969e1-6949-4b12-98e7-1411c186527c > │ ├── 2702b226-9cfc-4327-979d-e5508ab2e3d5 > │ ├── 4c51cb24-6f71-4d20-9d4c-65ed6e826949 > │ ├── e706d574-c5b2-467a-8640-1885ca252e80 > │ └── _metadata > ├── shared > └── taskowned {code} > If I shut down the cluster via {{stop-cluster.sh}}, these files will remain > on disk and not be cleaned up. > In contrast, if I cancel the job, at least {{chk-2}} will be deleted, but > still leaving the (empty) directories. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-15011) RocksDB temp directory not cleaned up
[ https://issues.apache.org/jira/browse/FLINK-15011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-15011. --- Resolution: Duplicate > RocksDB temp directory not cleaned up > - > > Key: FLINK-15011 > URL: https://issues.apache.org/jira/browse/FLINK-15011 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.9.1 >Reporter: Nico Kruber >Priority: Major > > When starting a Flink cluster with 2 TMs, then starting a job with RocksDB > with > {code:java} > state.backend: rocksdb {code} > it will create temp directories {{rocksdb-lib-}} where it extracts the > native libraries to. After shutting down the Flink cluster, these directories > remain (but their contents are cleaned up at least). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14378) Cleanup rocksDB lib folder if fail to load library
[ https://issues.apache.org/jira/browse/FLINK-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986741#comment-16986741 ] Nico Kruber commented on FLINK-14378: - I believe a proper cleanup should cover both scenarios and a fix for this one probably also fixes the other issue. I'm closing FLINK-15011 as a duplicate. Just to clarify here: we should also cleanup the {{rocksdb-lib-}} directory upon graceful shutdown. > Cleanup rocksDB lib folder if fail to load library > -- > > Key: FLINK-14378 > URL: https://issues.apache.org/jira/browse/FLINK-14378 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Major > > This improvement is inspired due to some of our machines need some time to > load the rocksDB library. When some other unrecoverable exceptions continue > to happen and the process to load library would be interrupted which cause > the {{rocksdb-lib}} folder created but not cleaned up. As the job continues > to failover, the {{rocksdb-lib}} folder would be created more and more. We > even come across that machine was running out of inodes! > Details could refer to current > [implementation|https://github.com/apache/flink/blob/80b27a150026b7b5cb707bd9fa3e17f565bb8112/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBStateBackend.java#L860] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15012) Checkpoint directory not cleaned up
Nico Kruber created FLINK-15012: --- Summary: Checkpoint directory not cleaned up Key: FLINK-15012 URL: https://issues.apache.org/jira/browse/FLINK-15012 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing Affects Versions: 1.9.1 Reporter: Nico Kruber I started a Flink cluster with 2 TMs using {{start-cluster.sh}} and the following config (in addition to the default {{flink-conf.yaml}}) {code:java} state.checkpoints.dir: file:///path/to/checkpoints/ state.backend: rocksdb {code} After submitting a jobwith checkpoints enabled (every 5s), checkpoints show up, e.g. {code:java} bb969f842bbc0ecc3b41b7fbe23b047b/ ├── chk-2 │ ├── 238969e1-6949-4b12-98e7-1411c186527c │ ├── 2702b226-9cfc-4327-979d-e5508ab2e3d5 │ ├── 4c51cb24-6f71-4d20-9d4c-65ed6e826949 │ ├── e706d574-c5b2-467a-8640-1885ca252e80 │ └── _metadata ├── shared └── taskowned {code} If I shut down the cluster via {{stop-cluster.sh}}, these files will remain on disk and not be cleaned up. In contrast, if I cancel the job, at least {{chk-2}} will be deleted, but still leaving the (empty) directories. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15011) RocksDB temp directory not cleaned up
Nico Kruber created FLINK-15011: --- Summary: RocksDB temp directory not cleaned up Key: FLINK-15011 URL: https://issues.apache.org/jira/browse/FLINK-15011 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Affects Versions: 1.9.1 Reporter: Nico Kruber When starting a Flink cluster with 2 TMs, then starting a job with RocksDB with {code:java} state.backend: rocksdb {code} it will create temp directories {{rocksdb-lib-}} where it extracts the native libraries to. After shutting down the Flink cluster, these directories remain (but their contents are cleaned up at least). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15010) Temp directories flink-netty-shuffle-* are not cleaned up
Nico Kruber created FLINK-15010: --- Summary: Temp directories flink-netty-shuffle-* are not cleaned up Key: FLINK-15010 URL: https://issues.apache.org/jira/browse/FLINK-15010 Project: Flink Issue Type: Improvement Components: Runtime / Network Affects Versions: 1.9.1 Reporter: Nico Kruber Starting a Flink cluster with 2 TMs and stopping it again will leave 2 temporary directories (and not delete them): flink-netty-shuffle- -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-14890) TestHarness for KeyedBroadcastProcessFunction
[ https://issues.apache.org/jira/browse/FLINK-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-14890: --- Assignee: Alexander Fedulov > TestHarness for KeyedBroadcastProcessFunction > - > > Key: FLINK-14890 > URL: https://issues.apache.org/jira/browse/FLINK-14890 > Project: Flink > Issue Type: Improvement > Components: Tests >Affects Versions: 1.9.1 >Reporter: Jun Qin >Assignee: Alexander Fedulov >Priority: Minor > > To test {{KeyedCoProcessFunction}}, one can use {{KeyedCoProcessOperator}} > and {{KeyedTwoInputStreamOperatorTestHarness}}, to test > {{KeyedBroadcastProcessFunction}}, I see {{CoBroadcastWithKeyedOperator}}, > but the TestHarness class is missing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-14825) Rework state processor api documentation
[ https://issues.apache.org/jira/browse/FLINK-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-14825: --- Assignee: Seth Wiesman > Rework state processor api documentation > > > Key: FLINK-14825 > URL: https://issues.apache.org/jira/browse/FLINK-14825 > Project: Flink > Issue Type: Improvement > Components: API / State Processor, Documentation >Reporter: Seth Wiesman >Assignee: Seth Wiesman >Priority: Major > Fix For: 1.10.0 > > > The current version of the spa docs were rushed to meet the 1.9 release. We > should rewrite them to be more complete, include better examples, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-13727) Build docs with jekyll 4.0.0 (final)
[ https://issues.apache.org/jira/browse/FLINK-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13727: Description: When Jekyll 4.0.0 is out, we should upgrade to this final version and discontinue using the beta. When we make this final, we could also follow these official recommendations: {quote} - This version of Jekyll comes with some major changes. Most notably: * Our `link` tag now comes with the `relative_url` filter incorporated into it. You should no longer prepend `{{ site.baseurl }}` to `% link foo.md %` For further details: https://github.com/jekyll/jekyll/pull/6727 * Our `post_url` tag now comes with the `relative_url` filter incorporated into it. You shouldn't prepend `{{ site.baseurl }}` to `% post_url 2019-03-27-hello %` For further details: https://github.com/jekyll/jekyll/pull/7589 * Support for deprecated configuration options has been removed. We will no longer output a warning and gracefully assign their values to the newer counterparts internally. - {quote} was: When Jekyll 4.0.0 is out, we should upgrade to this final version and discontinue using the beta. When we make this final, we could also follow these official recommendations: {quote} - This version of Jekyll comes with some major changes. Most notably: * Our `link` tag now comes with the `relative_url` filter incorporated into it. You should no longer prepend `{{ site.baseurl }}` to `{% link foo.md %}` For further details: https://github.com/jekyll/jekyll/pull/6727 * Our `post_url` tag now comes with the `relative_url` filter incorporated into it. You shouldn't prepend `{{ site.baseurl }}` to `{% post_url 2019-03-27-hello %}` For further details: https://github.com/jekyll/jekyll/pull/7589 * Support for deprecated configuration options has been removed. We will no longer output a warning and gracefully assign their values to the newer counterparts internally. - {quote} > Build docs with jekyll 4.0.0 (final) > > > Key: FLINK-13727 > URL: https://issues.apache.org/jira/browse/FLINK-13727 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When Jekyll 4.0.0 is out, we should upgrade to this final version and > discontinue using the beta. > When we make this final, we could also follow these official recommendations: > {quote} > - > This version of Jekyll comes with some major changes. > Most notably: > * Our `link` tag now comes with the `relative_url` filter incorporated into > it. > You should no longer prepend `{{ site.baseurl }}` to `% link foo.md > %` > For further details: https://github.com/jekyll/jekyll/pull/6727 > * Our `post_url` tag now comes with the `relative_url` filter incorporated > into it. > You shouldn't prepend `{{ site.baseurl }}` to `% post_url > 2019-03-27-hello %` > For further details: https://github.com/jekyll/jekyll/pull/7589 > * Support for deprecated configuration options has been removed. We will no > longer > output a warning and gracefully assign their values to the newer > counterparts > internally. > - > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-13722) Speed up documentation generation
[ https://issues.apache.org/jira/browse/FLINK-13722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974484#comment-16974484 ] Nico Kruber commented on FLINK-13722: - [~chesnay] Thanks a lot - if actually works well now and the new test setup was very nice. > Speed up documentation generation > - > > Key: FLINK-13722 > URL: https://issues.apache.org/jira/browse/FLINK-13722 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > > Creating the documentation via {{build_docs.sh}} currently takes about 150s! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-13791) Speed up sidenav by using group_by
[ https://issues.apache.org/jira/browse/FLINK-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-13791. --- Fix Version/s: 1.9.2 1.8.3 Resolution: Fixed Fixed via: - master: a8868dd2219468c4528011c85551a33b4fe0ee9b - release-1.9: 4e86b3efc01811f46355533bba5cb980e4140b2e - release-1.8: 0b43d8dc9b9dfb0b8cfe255ff95f05b55fd4f85a > Speed up sidenav by using group_by > -- > > Key: FLINK-13791 > URL: https://issues.apache.org/jira/browse/FLINK-13791 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.8.3, 1.9.2 > > Time Spent: 20m > Remaining Estimate: 0h > > {{_includes/sidenav.html}} parses through {{pages_by_language}} over and over > again trying to find children when building the (recursive) side navigation. > We could do this once with a {{group_by}} instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-13726) Build docs with jekyll 4.0.0.pre.beta1
[ https://issues.apache.org/jira/browse/FLINK-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-13726. --- Fix Version/s: 1.9.2 1.8.3 Resolution: Fixed Fixed via: - master: cb7e9049491c139c008fa6755a38df7073dacec1 - release-1.9: 345abdff83420cc8f84231f32732352677eb8c91 - release-1.8: adbf065ec2660ee63e282f0b6831d41d77d75f46 > Build docs with jekyll 4.0.0.pre.beta1 > -- > > Key: FLINK-13726 > URL: https://issues.apache.org/jira/browse/FLINK-13726 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.8.3, 1.9.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Jekyll 4 is way faster in generating the docs than jekyll 3 - probably due to > the newly introduced cache. Site generation time goes down by roughly a > factor of 2.5 even with the current beta version! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-13725) Use sassc for faster doc generation
[ https://issues.apache.org/jira/browse/FLINK-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-13725. --- Fix Version/s: 1.9.2 1.8.3 Resolution: Fixed Fixed via: - master: 135472e7f52c91ecbef8d8a331372daf9c4464ef - release-1.9: 129a21b74b4efce8897a31aa3bb1ea403f140b58 - release-1.8: ff954b568f9068d9fcb3f5c007dec296831ded0e > Use sassc for faster doc generation > --- > > Key: FLINK-13725 > URL: https://issues.apache.org/jira/browse/FLINK-13725 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.8.3, 1.9.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Jekyll requires {{sass}} but can optionally also use a C-based implementation > provided by {{sassc}}. Although we do not use sass directly, there may be > some indirect use inside jekyll. It doesn't seem to hurt to upgrade here. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-13729) Update website generation dependencies
[ https://issues.apache.org/jira/browse/FLINK-13729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974446#comment-16974446 ] Nico Kruber commented on FLINK-13729: - Fixed via: - master: 3f0f6f23f8a9559f706f4bc63d7806498ec4c128 - release-1.9: 8385159bb800cc8a17c0fad00db45856191d4090 - release-1.8: 70640a88b1f40aec7378a14ede3de1dde109f917 > Update website generation dependencies > -- > > Key: FLINK-13729 > URL: https://issues.apache.org/jira/browse/FLINK-13729 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The website generation dependencies are quite old. By upgrading some of them > we get improvements like a much nicer code highlighting and prepare for the > jekyll update of FLINK-13726 and FLINK-13727. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-13729) Update website generation dependencies
[ https://issues.apache.org/jira/browse/FLINK-13729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-13729. --- Fix Version/s: 1.9.2 1.8.3 Resolution: Fixed > Update website generation dependencies > -- > > Key: FLINK-13729 > URL: https://issues.apache.org/jira/browse/FLINK-13729 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.8.3, 1.9.2 > > Time Spent: 20m > Remaining Estimate: 0h > > The website generation dependencies are quite old. By upgrading some of them > we get improvements like a much nicer code highlighting and prepare for the > jekyll update of FLINK-13726 and FLINK-13727. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-13723) Use liquid-c for faster doc generation
[ https://issues.apache.org/jira/browse/FLINK-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13723: Fix Version/s: 1.9.2 1.8.3 > Use liquid-c for faster doc generation > -- > > Key: FLINK-13723 > URL: https://issues.apache.org/jira/browse/FLINK-13723 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.8.3, 1.9.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Jekyll requires {{liquid}} and only optionally uses {{liquid-c}} if > available. The latter uses natively-compiled code and reduces generation time > by ~5% for me. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-13723) Use liquid-c for faster doc generation
[ https://issues.apache.org/jira/browse/FLINK-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974441#comment-16974441 ] Nico Kruber commented on FLINK-13723: - Fixed via: - release-1.8: d9b0c4ba032000cea992d4a3eccf96b2cc6b8f43 - release-1.9: 4b1ef4dfe9155fb27bccf04fdc7bd9d4877bf93f > Use liquid-c for faster doc generation > -- > > Key: FLINK-13723 > URL: https://issues.apache.org/jira/browse/FLINK-13723 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Jekyll requires {{liquid}} and only optionally uses {{liquid-c}} if > available. The latter uses natively-compiled code and reduces generation time > by ~5% for me. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-13724) Remove unnecessary whitespace from the docs' sidenav
[ https://issues.apache.org/jira/browse/FLINK-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13724: Fix Version/s: (was: 1.9.1) 1.9.2 > Remove unnecessary whitespace from the docs' sidenav > > > Key: FLINK-13724 > URL: https://issues.apache.org/jira/browse/FLINK-13724 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.8.3, 1.9.2 > > Time Spent: 20m > Remaining Estimate: 0h > > The side navigation generates quite some white space that will end up in > every HTML page. Removing this reduces final page sizes and also improved > site generation speed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-13724) Remove unnecessary whitespace from the docs' sidenav
[ https://issues.apache.org/jira/browse/FLINK-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13724: Fix Version/s: 1.9.1 1.8.3 > Remove unnecessary whitespace from the docs' sidenav > > > Key: FLINK-13724 > URL: https://issues.apache.org/jira/browse/FLINK-13724 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.9.1, 1.8.3 > > Time Spent: 20m > Remaining Estimate: 0h > > The side navigation generates quite some white space that will end up in > every HTML page. Removing this reduces final page sizes and also improved > site generation speed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-13724) Remove unnecessary whitespace from the docs' sidenav
[ https://issues.apache.org/jira/browse/FLINK-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974379#comment-16974379 ] Nico Kruber commented on FLINK-13724: - Fixed in - release-1.9: 5bfbfc9d0ad1120da001f9911dca6834fb3a788c - release-1.8: 6d55ababf05f24070053cafd177c9b69cabeff60 > Remove unnecessary whitespace from the docs' sidenav > > > Key: FLINK-13724 > URL: https://issues.apache.org/jira/browse/FLINK-13724 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The side navigation generates quite some white space that will end up in > every HTML page. Removing this reduces final page sizes and also improved > site generation speed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-13728) Fix wrong closing tag order in sidenav
[ https://issues.apache.org/jira/browse/FLINK-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974378#comment-16974378 ] Nico Kruber commented on FLINK-13728: - Fixed in 1.8.3 via bd6b2e2eb527392e7b6100089fd83c212e976705 > Fix wrong closing tag order in sidenav > -- > > Key: FLINK-13728 > URL: https://issues.apache.org/jira/browse/FLINK-13728 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.8.1, 1.9.0, 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.9.1, 1.8.3 > > Time Spent: 20m > Remaining Estimate: 0h > > The order of closing HTML tags in the sidenav is wrong: instead of > {{}} it should be {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-13728) Fix wrong closing tag order in sidenav
[ https://issues.apache.org/jira/browse/FLINK-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13728: Fix Version/s: 1.8.3 > Fix wrong closing tag order in sidenav > -- > > Key: FLINK-13728 > URL: https://issues.apache.org/jira/browse/FLINK-13728 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.8.1, 1.9.0, 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0, 1.9.1, 1.8.3 > > Time Spent: 20m > Remaining Estimate: 0h > > The order of closing HTML tags in the sidenav is wrong: instead of > {{}} it should be {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-14781) [ZH] clarify that a RocksDB dependency in pom.xml may not be needed
[ https://issues.apache.org/jira/browse/FLINK-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-14781: --- Assignee: rdyang > [ZH] clarify that a RocksDB dependency in pom.xml may not be needed > --- > > Key: FLINK-14781 > URL: https://issues.apache.org/jira/browse/FLINK-14781 > Project: Flink > Issue Type: Bug > Components: chinese-translation, Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: rdyang >Priority: Major > > The English version was clarified with respect when and how to add the maven > dependencies via > https://github.com/apache/flink/commit/d36ce5ff77fae2b01b8fbe8e5c15d610de8ed9f5. > The Chinese version still needs that update -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-14781) [ZH] clarify that a RocksDB dependency in pom.xml may not be needed
[ https://issues.apache.org/jira/browse/FLINK-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-14781: Summary: [ZH] clarify that a RocksDB dependency in pom.xml may not be needed (was: clarify that a RocksDB dependency in pom.xml may not be needed) > [ZH] clarify that a RocksDB dependency in pom.xml may not be needed > --- > > Key: FLINK-14781 > URL: https://issues.apache.org/jira/browse/FLINK-14781 > Project: Flink > Issue Type: Bug > Components: chinese-translation, Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Priority: Major > > The English version was clarified with respect when and how to add the maven > dependencies via > https://github.com/apache/flink/commit/d36ce5ff77fae2b01b8fbe8e5c15d610de8ed9f5. > The Chinese version still needs that update -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14781) clarify that a RocksDB dependency in pom.xml may not be needed
Nico Kruber created FLINK-14781: --- Summary: clarify that a RocksDB dependency in pom.xml may not be needed Key: FLINK-14781 URL: https://issues.apache.org/jira/browse/FLINK-14781 Project: Flink Issue Type: Bug Components: chinese-translation, Documentation Affects Versions: 1.10.0 Reporter: Nico Kruber The English version was clarified with respect when and how to add the maven dependencies via https://github.com/apache/flink/commit/d36ce5ff77fae2b01b8fbe8e5c15d610de8ed9f5. The Chinese version still needs that update -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14575) Wrong (parent-first) class loader during serialization while submitting jobs
Nico Kruber created FLINK-14575: --- Summary: Wrong (parent-first) class loader during serialization while submitting jobs Key: FLINK-14575 URL: https://issues.apache.org/jira/browse/FLINK-14575 Project: Flink Issue Type: Bug Components: Client / Job Submission Affects Versions: 1.9.1, 1.8.2 Reporter: Nico Kruber When building the user code classloader for job submission, Flink uses a parent first class loader for serializing the ExecutionConfig which may lead to problems in the following case: # have hadoop in the system class loader from lib/ (this also provides avro 1.8.3) # have a user jar with a newer avro, e.g. 1.9.1 # register an Avro class with the execution config, e.g. through {{registerPojoType}} (please ignore for a second that this is not needed) During submission, a parent-first classloader will be used and thus, avro 1.8.3 will be used which does not map the version in the user classloader that will be used for deserialization. Exception during submission: {code} Caused by: java.io.InvalidClassException: org.apache.avro.specific.SpecificRecordBase; local class incompatible: stream classdesc serialVersionUID = 189988654766568477, local class serialVersionUID = -1463700717714793795 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751) at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1716) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1556) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at java.util.HashSet.readObject(HashSet.java:341) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:566) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:552) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:540) at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58) at org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:278) at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:83) at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:37) at org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:146) ... 10 more {code} The incriminating code is in * Flink 1.8.0: {{org.apache.flink.client.program.JobWithJars#buildUserCodeClassLoader}} * Flink master: {{org.apache.flink.client.ClientUtils#buildUserCodeClassLoader}} Thanks [~chesnay] for looking into this with me. [~aljoscha] Do you know why we use parent-first there? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-7002) Partitioning broken if enum is used in compound key specified using field expression
[ https://issues.apache.org/jira/browse/FLINK-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-7002. -- Resolution: Won't Fix Actually, this is not a Flink issue, but an issue of enums in Java and their implementation of {{hashCode}} which relies on the enum instance's memory address and therefore may be different in each JVM. You could instead use the enum's ordinal or its name in the key selector implementation. Please also refer to this for some more info: https://stackoverflow.com/questions/49140654/flink-error-key-group-is-not-in-keygrouprange > Partitioning broken if enum is used in compound key specified using field > expression > > > Key: FLINK-7002 > URL: https://issues.apache.org/jira/browse/FLINK-7002 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Affects Versions: 1.2.0, 1.3.1 >Reporter: Sebastian Klemke >Priority: Major > Attachments: TestJob.java, WorkingTestJob.java, testdata.avro > > > When groupBy() or keyBy() is used with multiple field expressions, at least > one of them being an enum type serialized using EnumTypeInfo, partitioning > seems random, resulting in incorrectly grouped/keyed output > datasets/datastreams. > The attached Flink DataSet API jobs and the test dataset detail the issue: > Both jobs count (id, type) occurrences, TestJob uses field expressions to > group, WorkingTestJob uses a KeySelector function. > Expected output for both is 6 records, with frequency value 100_000 each. If > you run in LocalEnvironment, results are in fact equivalent. But when run on > a cluster with 5 TaskManagers, only KeySelector function with String key > produces correct results whereas field expressions produce random, > non-repeatable, wrong results. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5334) outdated scala SBT quickstart example
[ https://issues.apache.org/jira/browse/FLINK-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-5334: --- Component/s: Documentation > outdated scala SBT quickstart example > - > > Key: FLINK-5334 > URL: https://issues.apache.org/jira/browse/FLINK-5334 > Project: Flink > Issue Type: Bug > Components: Documentation, Quickstarts >Affects Versions: 1.7.0 >Reporter: Nico Kruber >Priority: Major > > The scala quickstart set up via sbt-quickstart.sh or from the repository at > https://github.com/tillrohrmann/flink-project seems outdated compared to what > is set up with the maven archetype, e.g. Job.scala vs. BatchJob.scala and > StreamingJob.scala. This should probably be updated and also the hard-coded > example in sbt-quickstart.sh on the web page could be removed and download > the newest version instead as the mvn command does. > see > https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/scala_api_quickstart.html > for these two paths (SBT vs. Maven) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5334) outdated scala SBT quickstart example
[ https://issues.apache.org/jira/browse/FLINK-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-5334: --- Affects Version/s: 1.8.2 1.9.0 > outdated scala SBT quickstart example > - > > Key: FLINK-5334 > URL: https://issues.apache.org/jira/browse/FLINK-5334 > Project: Flink > Issue Type: Bug > Components: Documentation, Quickstarts >Affects Versions: 1.7.0, 1.8.2, 1.9.0 >Reporter: Nico Kruber >Priority: Major > > The scala quickstart set up via sbt-quickstart.sh or from the repository at > https://github.com/tillrohrmann/flink-project seems outdated compared to what > is set up with the maven archetype, e.g. Job.scala vs. BatchJob.scala and > StreamingJob.scala. This should probably be updated and also the hard-coded > example in sbt-quickstart.sh on the web page could be removed and download > the newest version instead as the mvn command does. > see > https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/scala_api_quickstart.html > for these two paths (SBT vs. Maven) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-5334) outdated scala SBT quickstart example
[ https://issues.apache.org/jira/browse/FLINK-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936183#comment-16936183 ] Nico Kruber commented on FLINK-5334: Actually, the script asks for a Flink and Scala version but then does not take them into account when creating the example project. > outdated scala SBT quickstart example > - > > Key: FLINK-5334 > URL: https://issues.apache.org/jira/browse/FLINK-5334 > Project: Flink > Issue Type: Bug > Components: Quickstarts >Affects Versions: 1.7.0 >Reporter: Nico Kruber >Priority: Major > > The scala quickstart set up via sbt-quickstart.sh or from the repository at > https://github.com/tillrohrmann/flink-project seems outdated compared to what > is set up with the maven archetype, e.g. Job.scala vs. BatchJob.scala and > StreamingJob.scala. This should probably be updated and also the hard-coded > example in sbt-quickstart.sh on the web page could be removed and download > the newest version instead as the mvn command does. > see > https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/scala_api_quickstart.html > for these two paths (SBT vs. Maven) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14104) Bump Jackson to 2.9.9.3
Nico Kruber created FLINK-14104: --- Summary: Bump Jackson to 2.9.9.3 Key: FLINK-14104 URL: https://issues.apache.org/jira/browse/FLINK-14104 Project: Flink Issue Type: Bug Components: BuildSystem / Shaded Affects Versions: shaded-8.0, shaded-7.0 Reporter: Nico Kruber Assignee: Nico Kruber Our current Jackson version (2.9.8) is vulnerable for at least this CVE: https://nvd.nist.gov/vuln/detail/CVE-2019-14379 Bumping to 2.9.9.3 should solve it. See https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-13771) Support kqueue Netty transports (MacOS)
[ https://issues.apache.org/jira/browse/FLINK-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923797#comment-16923797 ] Nico Kruber commented on FLINK-13771: - [~aitozi] I'm not working on this and I also do not know how much it is worth since Mac servers (running Flink, in particular) are not really wide-spread, afaik. However, the actual implementation overhead should be low. I'll assign you to the issue and can have a look on the PR when you are done. > Support kqueue Netty transports (MacOS) > --- > > Key: FLINK-13771 > URL: https://issues.apache.org/jira/browse/FLINK-13771 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Reporter: Nico Kruber >Priority: Major > > It seems like Netty is now also supporting MacOS's native transport > {{kqueue}}: > https://netty.io/wiki/native-transports.html#using-the-macosbsd-native-transport > We should allow this via {{taskmanager.network.netty.transport}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (FLINK-13771) Support kqueue Netty transports (MacOS)
[ https://issues.apache.org/jira/browse/FLINK-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-13771: --- Assignee: Aitozi > Support kqueue Netty transports (MacOS) > --- > > Key: FLINK-13771 > URL: https://issues.apache.org/jira/browse/FLINK-13771 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Reporter: Nico Kruber >Assignee: Aitozi >Priority: Major > > It seems like Netty is now also supporting MacOS's native transport > {{kqueue}}: > https://netty.io/wiki/native-transports.html#using-the-macosbsd-native-transport > We should allow this via {{taskmanager.network.netty.transport}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-12122) Spread out tasks evenly across all available registered TaskManagers
[ https://issues.apache.org/jira/browse/FLINK-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922710#comment-16922710 ] Nico Kruber commented on FLINK-12122: - [~anton_ryabtsev] true, memory could become an issue in certain scenarios. However, I don't get the GC part: native memory for threads shouldn't be part of GC, network buffers are pre-allocated and off-heap, and the task's load is, in sum, the same, just more widely spread. > Spread out tasks evenly across all available registered TaskManagers > > > Key: FLINK-12122 > URL: https://issues.apache.org/jira/browse/FLINK-12122 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Till Rohrmann >Priority: Major > Attachments: image-2019-05-21-12-28-29-538.png, > image-2019-05-21-13-02-50-251.png > > > With Flip-6, we changed the default behaviour how slots are assigned to > {{TaskManages}}. Instead of evenly spreading it out over all registered > {{TaskManagers}}, we randomly pick slots from {{TaskManagers}} with a > tendency to first fill up a TM before using another one. This is a regression > wrt the pre Flip-6 code. > I suggest to change the behaviour so that we try to evenly distribute slots > across all available {{TaskManagers}} by considering how many of their slots > are already allocated. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-10177) Use network transport type AUTO by default
[ https://issues.apache.org/jira/browse/FLINK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913127#comment-16913127 ] Nico Kruber commented on FLINK-10177: - One thing to consider/test: if, for whatever reason one TM would end up with NIO and another with epoll, they should theoretically work together but this should be verified. After all, this is just the local channel listening implementation. On the other hand, most deployments should be homogeneous and therefore not end up in that scenario. > Use network transport type AUTO by default > -- > > Key: FLINK-10177 > URL: https://issues.apache.org/jira/browse/FLINK-10177 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration, Runtime / Network >Affects Versions: 1.6.0, 1.7.0 >Reporter: Nico Kruber >Assignee: boshu Zheng >Priority: Major > > Now that the shading issue with the native library is fixed (FLINK-9463), > EPOLL should be available on (all?) Linux distributions and provide some > efficiency gain (if enabled). Therefore, > {{taskmanager.network.netty.transport}} should be set to {{auto}} by default. > If EPOLL is not available, it will automatically fall back to NIO which > currently is the default. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-13770) Bump Netty to 4.1.39.Final
[ https://issues.apache.org/jira/browse/FLINK-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913124#comment-16913124 ] Nico Kruber commented on FLINK-13770: - see https://issues.apache.org/jira/browse/FLINK-10177 > Bump Netty to 4.1.39.Final > -- > > Key: FLINK-13770 > URL: https://issues.apache.org/jira/browse/FLINK-13770 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I quickly went through all the changelogs for Netty 4.1.32 (which we > currently use) to the latest Netty 4.1.39.Final. Below, you will find a > list of bug fixes and performance improvements that may affect us. Nice > changes we could benefit from, also for the Java > 8 efforts. The most > important ones fixing leaks etc are #8921, #9167, #9274, #9394, and the > various {{CompositeByteBuf}} fixes. The rest are mostly performance > improvements. > Since we are still early in the dev cycle for Flink 1.10, it would be > nice to update now and verify that the new version works correctly. > {code} > Netty 4.1.33.Final > - Fix ClassCastException and native crash when using kqueue transport > (#8665) > - Provide a way to cache the internal nioBuffer of the PooledByteBuffer > to reduce GC (#8603) > Netty 4.1.34.Final > - Do not use GetPrimitiveArrayCritical(...) due multiple not-fixed bugs > related to GCLocker (#8921) > - Correctly monkey-patch id also in whe os / arch is used within library > name (#8913) > - Further reduce ensureAccessible() overhead (#8895) > - Support using an Executor to offload blocking / long-running tasks > when processing TLS / SSL via the SslHandler (#8847) > - Minimize memory footprint for AbstractChannelHandlerContext for > handlers that execute in the EventExecutor (#8786) > - Fix three bugs in CompositeByteBuf (#8773) > Netty 4.1.35.Final > - Fix possible ByteBuf leak when CompositeByteBuf is resized (#8946) > - Correctly produce ssl alert when certificate validation fails on the > client-side when using native SSL implementation (#8949) > Netty 4.1.37.Final > - Don't filter out TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (#9274) > - Try to mark child channel writable again once the parent channel > becomes writable (#9254) > - Properly debounce wakeups (#9191) > - Don't read from timerfd and eventfd on each EventLoop tick (#9192) > - Correctly detect that KeyManagerFactory is not supported when using > OpenSSL 1.1.0+ (#9170) > - Fix possible unsafe sharing of internal NIO buffer in CompositeByteBuf > (#9169) > - KQueueEventLoop won't unregister active channels reusing a file > descriptor (#9149) > - Prefer direct io buffers if direct buffers pooled (#9167) > Netty 4.1.38.Final > - Prevent ByteToMessageDecoder from overreading when !isAutoRead (#9252) > - Correctly take length of ByteBufInputStream into account for > readLine() / readByte() (#9310) > - availableSharedCapacity will be slowly exhausted (#9394) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (FLINK-12983) Replace descriptive histogram's storage back-end
[ https://issues.apache.org/jira/browse/FLINK-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-12983. --- Fix Version/s: 1.10.0 Resolution: Fixed merged to master via f57a615 > Replace descriptive histogram's storage back-end > > > Key: FLINK-12983 > URL: https://issues.apache.org/jira/browse/FLINK-12983 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Metrics >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > {{DescriptiveStatistics}} relies on their {{ResizableDoubleArray}} for > storing double values for their histograms. However, this is constantly > resizing an internal array and seems to have quite some overhead. > Additionally, we're not using {{SynchronizedDescriptiveStatistics}} which, > according to its docs, we should. Currently, we seem to be somewhat safe > because {{ResizableDoubleArray}} has some synchronized parts but these are > scheduled to go away with commons.math version 4. > Internal tests with the current implementation, one based on a linear array > of twice the histogram size (and moving values back to the start once the > window reaches the end), and one using a circular array (wrapping around with > flexible start position) has shown these numbers using the optimised code > from FLINK-10236, FLINK-12981, and FLINK-12982: > # only adding values to the histogram > {code} > Benchmark Mode Cnt Score > Error Units > HistogramBenchmarks.dropwizardHistogramAdd thrpt 30 47985.359 ± > 25.847 ops/ms > HistogramBenchmarks.descriptiveHistogramAddthrpt 30 70158.792 ± > 276.858 ops/ms > --- with FLINK-10236, FLINK-12981, and FLINK-12982 --- > HistogramBenchmarks.descriptiveHistogramAddthrpt 30 75303.040 ± > 475.355 ops/ms > HistogramBenchmarks.descrHistogramCircularAdd thrpt 30 200906.902 ± > 384.483 ops/ms > HistogramBenchmarks.descrHistogramLinearAddthrpt 30 189788.728 ± > 233.283 ops/ms > {code} > # after adding each value, also retrieving a common set of metrics: > {code} > Benchmark Mode Cnt Score > Error Units > HistogramBenchmarks.dropwizardHistogramthrpt 30 400.274 ± > 4.930 ops/ms > HistogramBenchmarks.descriptiveHistogram thrpt 30 124.533 ± > 1.060 ops/ms > --- with FLINK-10236, FLINK-12981, and FLINK-12982 --- > HistogramBenchmarks.descriptiveHistogram thrpt 30 251.895 ± > 1.809 ops/ms > HistogramBenchmarks.descrHistogramCircular thrpt 30 301.068 ± > 2.077 ops/ms > HistogramBenchmarks.descrHistogramLinear thrpt 30 234.050 ± > 5.485 ops/ms > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (FLINK-12982) Make DescriptiveStatisticsHistogramStatistics a true point-in-time snapshot
[ https://issues.apache.org/jira/browse/FLINK-12982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-12982. --- Fix Version/s: 1.10.0 Resolution: Fixed merged in master via 4452be3 > Make DescriptiveStatisticsHistogramStatistics a true point-in-time snapshot > --- > > Key: FLINK-12982 > URL: https://issues.apache.org/jira/browse/FLINK-12982 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Metrics >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Instead of redirecting {{DescriptiveStatisticsHistogramStatistics}} calls to > {{DescriptiveStatistics}}, it takes a point-in-time snapshot using an own > {{UnivariateStatistic}} implementation that > * calculates min, max, mean, and standard deviation in one go (as opposed to > four iterations over the values array!) > * caches pivots for the percentile calculation to speed up retrieval of > multiple percentiles/quartiles > This is also similar to the semantics of our implementation using codahale's > {{DropWizard}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (FLINK-12981) Ignore NaN values in histogram's percentile implementation
[ https://issues.apache.org/jira/browse/FLINK-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-12981. --- Fix Version/s: 1.10.0 Resolution: Fixed merged into master via e59b9d2 > Ignore NaN values in histogram's percentile implementation > -- > > Key: FLINK-12981 > URL: https://issues.apache.org/jira/browse/FLINK-12981 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Metrics >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Histogram metrics use "long" values and therefore, there is no {{Double.NaN}} > in {{DescriptiveStatistics}}' data and there is no need to cleanse it while > working with it. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-13793) Build different language docs in parallel
Nico Kruber created FLINK-13793: --- Summary: Build different language docs in parallel Key: FLINK-13793 URL: https://issues.apache.org/jira/browse/FLINK-13793 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Nico Kruber Assignee: Nico Kruber Unfortunately, jekyll is lacking parallel builds and thus not making use of unused resources. In the special case of building the documentation without serving it, we could build each language (en, zh) in a separate sub-process and thus get some parallelization. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-13791) Speed up sidenav by using group_by
Nico Kruber created FLINK-13791: --- Summary: Speed up sidenav by using group_by Key: FLINK-13791 URL: https://issues.apache.org/jira/browse/FLINK-13791 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Nico Kruber Assignee: Nico Kruber {{_includes/sidenav.html}} parses through {{pages_by_language}} over and over again trying to find children when building the (recursive) side navigation. We could do this once with a {{group_by}} instead. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (FLINK-12984) Only call Histogram#getStatistics() once per set of retrieved statistics
[ https://issues.apache.org/jira/browse/FLINK-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-12984. --- Resolution: Fixed Fix Version/s: 1.10.0 fixed on master via d9f012746f5b8b36ebb416f70e9f5bac93538d5d > Only call Histogram#getStatistics() once per set of retrieved statistics > > > Key: FLINK-12984 > URL: https://issues.apache.org/jira/browse/FLINK-12984 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Metrics >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In some occasions, {{Histogram#getStatistics()}} was called multiple times to > retrieve different statistics. However, at least the Dropwizard > implementation has some constant overhead per call and we should maybe rather > interpret this method as returning a point-in-time snapshot of the histogram > in order to get consistent values when querying them. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (FLINK-12987) DescriptiveStatisticsHistogram#getCount does not return the number of elements seen
[ https://issues.apache.org/jira/browse/FLINK-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-12987. --- Resolution: Fixed Fix Version/s: 1.10.0 fixed on master via fd9ef60cc8448a5f4d1915973e168aad073d8e8d > DescriptiveStatisticsHistogram#getCount does not return the number of > elements seen > --- > > Key: FLINK-12987 > URL: https://issues.apache.org/jira/browse/FLINK-12987 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.6.4, 1.7.2, 1.8.0, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > {{DescriptiveStatisticsHistogram#getCount()}} returns the number of elements > in the current window and not the number of total elements seen over time. In > contrast, {{DropwizardHistogramWrapper}} does this correctly. > We should unify the behaviour and add a unit test for it (there is no generic > histogram test yet). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-12987) DescriptiveStatisticsHistogram#getCount does not return the number of elements seen
[ https://issues.apache.org/jira/browse/FLINK-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-12987: Affects Version/s: 1.9.0 > DescriptiveStatisticsHistogram#getCount does not return the number of > elements seen > --- > > Key: FLINK-12987 > URL: https://issues.apache.org/jira/browse/FLINK-12987 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.6.4, 1.7.2, 1.8.0, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {{DescriptiveStatisticsHistogram#getCount()}} returns the number of elements > in the current window and not the number of total elements seen over time. In > contrast, {{DropwizardHistogramWrapper}} does this correctly. > We should unify the behaviour and add a unit test for it (there is no generic > histogram test yet). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (FLINK-13020) UT Failure: ChainLengthDecreaseTest
[ https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908534#comment-16908534 ] Nico Kruber edited comment on FLINK-13020 at 8/15/19 11:00 PM: --- Actually, I just encountered this error in a branch of mine which is based on [latest master|https://github.com/apache/flink/commit/428ce1b938813fba287a51bf86e6c52ef54453cb]. So either there has been a regression, or the fix does not work in all cases, or it is no duplicate afterall: {code} 17:30:18.083 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.113 s <<< FAILURE! - in org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest 17:30:18.083 [ERROR] testMigrationAndRestore[Migrate Savepoint: 1.8](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest) Time elapsed: 0.268 s <<< ERROR! java.util.concurrent.ExecutionException: java.util.concurrent.CompletionException: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs {code} https://api.travis-ci.com/v3/job/225588484/log.txt {code} 17:30:17,408 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - Configuring application-defined state backend with job/cluster config 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (2/4) (ffb5e756d6acddab9cab76e2a0a32904) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Map (4/4) (79fcf333d4d11eae297b65e52e397658) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Map (2/4) (aedaa4a61e74a3b766fafbef46e6aea6) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (4/4) (a1f07e2714e73b2533291a322961ea67) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (3/4) (6073be38d7be0ee571558f1dc865837a) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Map (1/4) (e4bc84d8137769b513d1a5107027500d) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Map (3/4) (6834950d9742da9c6a784ecc5ee892df) switched from DEPLOYING to RUNNING. 17:30:17,409 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/4) of job 075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. Aborting checkpoint. 17:30:17,413 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/4) of job 075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. Aborting checkpoint. 17:30:17,414 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/4) of job 075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. Aborting checkpoint. 17:30:17,416 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/4) of job 075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. Aborting checkpoint. 17:30:17,417 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source (1/4) of job 075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. Aborting checkpoint. 17:30:17,423 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source (1/4) (8b302fefb0c10b7fd0b66f4fdb253632) switched from DEPLOYING to RUNNING. 17:30:17,423 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - Using application-defined state backend: MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: UNDEFINED, maxStateSize: 5242880) 17:30:17,423 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - Configuring application-defined state backend with job/cluster config 17:30:17,424 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/4)
[jira] [Updated] (FLINK-13020) UT Failure: ChainLengthDecreaseTest
[ https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13020: Affects Version/s: 1.10.0 > UT Failure: ChainLengthDecreaseTest > --- > > Key: FLINK-13020 > URL: https://issues.apache.org/jira/browse/FLINK-13020 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.10.0 >Reporter: Bowen Li >Priority: Major > > {code:java} > 05:47:24.893 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 19.836 s <<< FAILURE! - in > org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest > 05:47:24.895 [ERROR] testMigrationAndRestore[Migrate Savepoint: > 1.3](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest) > Time elapsed: 1.501 s <<< ERROR! > java.util.concurrent.ExecutionException: > java.util.concurrent.CompletionException: > org.apache.flink.runtime.checkpoint.CheckpointException: Task received > cancellation from one of its inputs > Caused by: java.util.concurrent.CompletionException: > org.apache.flink.runtime.checkpoint.CheckpointException: Task received > cancellation from one of its inputs > Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task > received cancellation from one of its inputs > Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task > received cancellation from one of its inputs > ... > 05:48:27.736 [ERROR] Errors: > 05:48:27.736 [ERROR] > ChainLengthDecreaseTest>AbstractOperatorRestoreTestBase.testMigrationAndRestore:102->AbstractOperatorRestoreTestBase.migrateJob:138 > » Execution > 05:48:27.736 [INFO] > {code} > https://travis-ci.org/apache/flink/jobs/551053821 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (FLINK-13020) UT Failure: ChainLengthDecreaseTest
[ https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908534#comment-16908534 ] Nico Kruber commented on FLINK-13020: - Actually, I just encountered this error in a branch of mine which is based on [latest master|https://github.com/apache/flink/commit/428ce1b938813fba287a51bf86e6c52ef54453cb]. So either there has been a regression, or the fix does not work in all cases, or it is no duplicate afterall: {code} 17:30:18.083 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.113 s <<< FAILURE! - in org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest 17:30:18.083 [ERROR] testMigrationAndRestore[Migrate Savepoint: 1.8](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest) Time elapsed: 0.268 s <<< ERROR! java.util.concurrent.ExecutionException: java.util.concurrent.CompletionException: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task received cancellation from one of its inputs {code} https://api.travis-ci.com/v3/job/225588484/log.txt > UT Failure: ChainLengthDecreaseTest > --- > > Key: FLINK-13020 > URL: https://issues.apache.org/jira/browse/FLINK-13020 > Project: Flink > Issue Type: Improvement >Reporter: Bowen Li >Priority: Major > > {code:java} > 05:47:24.893 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 19.836 s <<< FAILURE! - in > org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest > 05:47:24.895 [ERROR] testMigrationAndRestore[Migrate Savepoint: > 1.3](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest) > Time elapsed: 1.501 s <<< ERROR! > java.util.concurrent.ExecutionException: > java.util.concurrent.CompletionException: > org.apache.flink.runtime.checkpoint.CheckpointException: Task received > cancellation from one of its inputs > Caused by: java.util.concurrent.CompletionException: > org.apache.flink.runtime.checkpoint.CheckpointException: Task received > cancellation from one of its inputs > Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task > received cancellation from one of its inputs > Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task > received cancellation from one of its inputs > ... > 05:48:27.736 [ERROR] Errors: > 05:48:27.736 [ERROR] > ChainLengthDecreaseTest>AbstractOperatorRestoreTestBase.testMigrationAndRestore:102->AbstractOperatorRestoreTestBase.migrateJob:138 > » Execution > 05:48:27.736 [INFO] > {code} > https://travis-ci.org/apache/flink/jobs/551053821 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Reopened] (FLINK-13020) UT Failure: ChainLengthDecreaseTest
[ https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reopened FLINK-13020: - > UT Failure: ChainLengthDecreaseTest > --- > > Key: FLINK-13020 > URL: https://issues.apache.org/jira/browse/FLINK-13020 > Project: Flink > Issue Type: Improvement >Reporter: Bowen Li >Priority: Major > > {code:java} > 05:47:24.893 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 19.836 s <<< FAILURE! - in > org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest > 05:47:24.895 [ERROR] testMigrationAndRestore[Migrate Savepoint: > 1.3](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest) > Time elapsed: 1.501 s <<< ERROR! > java.util.concurrent.ExecutionException: > java.util.concurrent.CompletionException: > org.apache.flink.runtime.checkpoint.CheckpointException: Task received > cancellation from one of its inputs > Caused by: java.util.concurrent.CompletionException: > org.apache.flink.runtime.checkpoint.CheckpointException: Task received > cancellation from one of its inputs > Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task > received cancellation from one of its inputs > Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task > received cancellation from one of its inputs > ... > 05:48:27.736 [ERROR] Errors: > 05:48:27.736 [ERROR] > ChainLengthDecreaseTest>AbstractOperatorRestoreTestBase.testMigrationAndRestore:102->AbstractOperatorRestoreTestBase.migrateJob:138 > » Execution > 05:48:27.736 [INFO] > {code} > https://travis-ci.org/apache/flink/jobs/551053821 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-13727) Build docs with jekyll 4.0.0 (final)
[ https://issues.apache.org/jira/browse/FLINK-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13727: Description: When Jekyll 4.0.0 is out, we should upgrade to this final version and discontinue using the beta. When we make this final, we could also follow these official recommendations: {quote} - This version of Jekyll comes with some major changes. Most notably: * Our `link` tag now comes with the `relative_url` filter incorporated into it. You should no longer prepend `{{ site.baseurl }}` to `{% link foo.md %}` For further details: https://github.com/jekyll/jekyll/pull/6727 * Our `post_url` tag now comes with the `relative_url` filter incorporated into it. You shouldn't prepend `{{ site.baseurl }}` to `{% post_url 2019-03-27-hello %}` For further details: https://github.com/jekyll/jekyll/pull/7589 * Support for deprecated configuration options has been removed. We will no longer output a warning and gracefully assign their values to the newer counterparts internally. - {quote} was:When Jekyll 4.0.0 is out, we should upgrade to this final version and discontinue using the beta. > Build docs with jekyll 4.0.0 (final) > > > Key: FLINK-13727 > URL: https://issues.apache.org/jira/browse/FLINK-13727 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Priority: Major > > When Jekyll 4.0.0 is out, we should upgrade to this final version and > discontinue using the beta. > When we make this final, we could also follow these official recommendations: > {quote} > - > This version of Jekyll comes with some major changes. > Most notably: > * Our `link` tag now comes with the `relative_url` filter incorporated into > it. > You should no longer prepend `{{ site.baseurl }}` to `{% link foo.md %}` > For further details: https://github.com/jekyll/jekyll/pull/6727 > * Our `post_url` tag now comes with the `relative_url` filter incorporated > into it. > You shouldn't prepend `{{ site.baseurl }}` to `{% post_url > 2019-03-27-hello %}` > For further details: https://github.com/jekyll/jekyll/pull/7589 > * Support for deprecated configuration options has been removed. We will no > longer > output a warning and gracefully assign their values to the newer > counterparts > internally. > - > {quote} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13729) Update website generation dependencies
Nico Kruber created FLINK-13729: --- Summary: Update website generation dependencies Key: FLINK-13729 URL: https://issues.apache.org/jira/browse/FLINK-13729 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 1.10.0 Reporter: Nico Kruber Assignee: Nico Kruber The website generation dependencies are quite old. By upgrading some of them we get improvements like a much nicer code highlighting and prepare for the jekyll update of FLINK-13726 and FLINK-13727. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13728) Fix wrong closing tag order in sidenav
Nico Kruber created FLINK-13728: --- Summary: Fix wrong closing tag order in sidenav Key: FLINK-13728 URL: https://issues.apache.org/jira/browse/FLINK-13728 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.8.1, 1.9.0, 1.10.0 Reporter: Nico Kruber Assignee: Nico Kruber The order of closing HTML tags in the sidenav is wrong: instead of {{}} it should be {{}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-13724) Remove unnecessary whitespace from the docs' sidenav
[ https://issues.apache.org/jira/browse/FLINK-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13724: Description: The side navigation generates quite some white space that will end up in every HTML page. Removing this reduces final page sizes and also improved site generation speed. (was: The site navigation generates quite some white space that will end up in every HTML page. Removing this reduces final page sizes and also improved site generation speed.) > Remove unnecessary whitespace from the docs' sidenav > > > Key: FLINK-13724 > URL: https://issues.apache.org/jira/browse/FLINK-13724 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > > The side navigation generates quite some white space that will end up in > every HTML page. Removing this reduces final page sizes and also improved > site generation speed. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-13724) Remove unnecessary whitespace from the docs' sidenav
[ https://issues.apache.org/jira/browse/FLINK-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13724: Summary: Remove unnecessary whitespace from the docs' sidenav (was: Remove unnecessary whitespace from the docs' sitenav) > Remove unnecessary whitespace from the docs' sidenav > > > Key: FLINK-13724 > URL: https://issues.apache.org/jira/browse/FLINK-13724 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > > The site navigation generates quite some white space that will end up in > every HTML page. Removing this reduces final page sizes and also improved > site generation speed. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-13726) Build docs with jekyll 4.0.0.pre.beta1
[ https://issues.apache.org/jira/browse/FLINK-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13726: Description: Jekyll 4 is way faster in generating the docs than jekyll 3 - probably due to the newly introduced cache. Site generation time goes down by roughly a factor of 2.5 even with the current beta version! (was: Jekyll 4 is way faster in generating the docs than jekyll 3 - probably due to the newly introduced cache. Site generation time goes down by roughly a factor of 2.5!) > Build docs with jekyll 4.0.0.pre.beta1 > -- > > Key: FLINK-13726 > URL: https://issues.apache.org/jira/browse/FLINK-13726 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.10.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > > Jekyll 4 is way faster in generating the docs than jekyll 3 - probably due to > the newly introduced cache. Site generation time goes down by roughly a > factor of 2.5 even with the current beta version! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13727) Build docs with jekyll 4.0.0 (final)
Nico Kruber created FLINK-13727: --- Summary: Build docs with jekyll 4.0.0 (final) Key: FLINK-13727 URL: https://issues.apache.org/jira/browse/FLINK-13727 Project: Flink Issue Type: Sub-task Components: Documentation Affects Versions: 1.10.0 Reporter: Nico Kruber When Jekyll 4.0.0 is out, we should upgrade to this final version and discontinue using the beta. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13726) Build docs with jekyll 4.0.0.pre.beta1
Nico Kruber created FLINK-13726: --- Summary: Build docs with jekyll 4.0.0.pre.beta1 Key: FLINK-13726 URL: https://issues.apache.org/jira/browse/FLINK-13726 Project: Flink Issue Type: Sub-task Components: Documentation Affects Versions: 1.10.0 Reporter: Nico Kruber Assignee: Nico Kruber Jekyll 4 is way faster in generating the docs than jekyll 3 - probably due to the newly introduced cache. Site generation time goes down by roughly a factor of 2.5! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13725) Use sassc for faster doc generation
Nico Kruber created FLINK-13725: --- Summary: Use sassc for faster doc generation Key: FLINK-13725 URL: https://issues.apache.org/jira/browse/FLINK-13725 Project: Flink Issue Type: Sub-task Components: Documentation Affects Versions: 1.10.0 Reporter: Nico Kruber Assignee: Nico Kruber Jekyll requires {{sass}} but can optionally also use a C-based implementation provided by {{sassc}}. Although we do not use sass directly, there may be some indirect use inside jekyll. It doesn't seem to hurt to upgrade here. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13724) Remove unnecessary whitespace from the docs' sitenav
Nico Kruber created FLINK-13724: --- Summary: Remove unnecessary whitespace from the docs' sitenav Key: FLINK-13724 URL: https://issues.apache.org/jira/browse/FLINK-13724 Project: Flink Issue Type: Sub-task Components: Documentation Affects Versions: 1.10.0 Reporter: Nico Kruber Assignee: Nico Kruber The site navigation generates quite some white space that will end up in every HTML page. Removing this reduces final page sizes and also improved site generation speed. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13723) Use liquid-c for faster doc generation
Nico Kruber created FLINK-13723: --- Summary: Use liquid-c for faster doc generation Key: FLINK-13723 URL: https://issues.apache.org/jira/browse/FLINK-13723 Project: Flink Issue Type: Sub-task Components: Documentation Affects Versions: 1.10.0 Reporter: Nico Kruber Assignee: Nico Kruber Jekyll requires {{liquid}} and only optionally uses {{liquid-c}} if available. The latter uses natively-compiled code and reduces generation time by ~5% for me. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13722) Speed up documentation generation
Nico Kruber created FLINK-13722: --- Summary: Speed up documentation generation Key: FLINK-13722 URL: https://issues.apache.org/jira/browse/FLINK-13722 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 1.10.0 Reporter: Nico Kruber Assignee: Nico Kruber Creating the documentation via {{build_docs.sh}} currently takes about 150s! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-13537) Changing Kafka producer pool size and scaling out may create overlapping transaction IDs
[ https://issues.apache.org/jira/browse/FLINK-13537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13537: Description: The Kafka producer's transaction IDs are only generated once when there was no previous state for that operator. In the case where we restore and increase parallelism (scale-out), some operators may not have previous state and create new IDs. Now, if we also reduce the {{poolSize}}, these new IDs may overlap with the old ones which should never happen! Similarly, a scale-in + increasing {{poolSize}} could lead the the same thing. An easy "fix" for this would be to forbid changing the {{poolSize}}. We could potentially be a bit better by only forbidding changes that can lead to transaction ID overlaps which we can identify from the formulae that {{TransactionalIdsGenerator}} uses. This should probably be the first step which can also be back-ported to older Flink versions just in case. On a side note, the current scheme also relies on the fact, that the operator's list state distributes previous states during scale-out in a fashion that only the operators with the highest subtask indices do not get a previous state. This is somewhat "guaranteed" by {{OperatorStateStore#getListState()}} but I'm not sure whether we should actually rely on that there. was: The Kafka producer's transaction IDs are only generated once when there was no previous state for that operator. In the case where we restore and increase parallelism (scale-out), some operators may not have previous state and create new IDs. Now, if we also reduce the poolSize, these new IDs may overlap with the old ones which should never happen! On a side note, the current scheme also relies on the fact, that the operator's list state distributes previous states during scale-out in a fashion that only the operators with the highest subtask indices do not get a previous state. This is somewhat "guaranteed" by {{OperatorStateStore#getListState()}} but I'm not sure whether we should actually rely on that there. > Changing Kafka producer pool size and scaling out may create overlapping > transaction IDs > > > Key: FLINK-13537 > URL: https://issues.apache.org/jira/browse/FLINK-13537 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.8.1, 1.9.0 >Reporter: Nico Kruber >Priority: Major > > The Kafka producer's transaction IDs are only generated once when there was > no previous state for that operator. In the case where we restore and > increase parallelism (scale-out), some operators may not have previous state > and create new IDs. Now, if we also reduce the {{poolSize}}, these new IDs > may overlap with the old ones which should never happen! Similarly, a > scale-in + increasing {{poolSize}} could lead the the same thing. > An easy "fix" for this would be to forbid changing the {{poolSize}}. We could > potentially be a bit better by only forbidding changes that can lead to > transaction ID overlaps which we can identify from the formulae that > {{TransactionalIdsGenerator}} uses. This should probably be the first step > which can also be back-ported to older Flink versions just in case. > > On a side note, the current scheme also relies on the fact, that the > operator's list state distributes previous states during scale-out in a > fashion that only the operators with the highest subtask indices do not get a > previous state. This is somewhat "guaranteed" by > {{OperatorStateStore#getListState()}} but I'm not sure whether we should > actually rely on that there. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (FLINK-13498) Reduce Kafka producer startup time by aborting transactions in parallel
[ https://issues.apache.org/jira/browse/FLINK-13498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-13498. --- Resolution: Fixed Fix Version/s: 1.10.0 fixed on master via d774fea > Reduce Kafka producer startup time by aborting transactions in parallel > --- > > Key: FLINK-13498 > URL: https://issues.apache.org/jira/browse/FLINK-13498 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.8.1, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When a Flink job with a Kafka producer starts up without previous state, it > currently starts 5 * kafkaPoolSize number of Kafka producers (per sink > instance) to abort potentially existing transactions from a first run without > a completed snapshot. > Apparently, this is quite slow and it is also done sequentially. Until there > is a better way of aborting these transactions with Kafka, we could do this > in parallel quite easily and at least make use of lingering CPU resources. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-13535) Do not abort transactions twice during KafkaProducer startup
[ https://issues.apache.org/jira/browse/FLINK-13535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13535: Description: During startup of a transactional Kafka producer from previous state, we recover in two steps: # in {{TwoPhaseCommitSinkFunction}}, we commit pending commit-transactions and abort pending transactions and then call into {{finishRecoveringContext()}} # in {{FlinkKafkaProducer#finishRecoveringContext()}} we iterate over all recovered transaction IDs and abort them. This may lead to some transactions being worked on twice. Since this is quite some expensive operation, we unnecessarily slow down the job startup but could easily give {{finishRecoveringContext()}} a set of transactions that {{TwoPhaseCommitSinkFunction}} already covered instead. was: During startup of a transactional Kafka producer from previous state, we recover in two steps: # in {{TwoPhaseCommitSinkFunction}}, we commit pending commit-transactions and abort pending transactions and then call into {{finishRecoveringContext()}} # in {{FlinkKafkaProducer#finishRecoveringContext()}} we iterate over all recovered transaction IDs and abort them This may lead to some transactions being worked on twice. Since this is quite some expensive operation, we unnecessarily slow down the job startup but could easily give {{finishRecoveringContext()}} a set of transactions that {{TwoPhaseCommitSinkFunction}} already covered instead. > Do not abort transactions twice during KafkaProducer startup > > > Key: FLINK-13535 > URL: https://issues.apache.org/jira/browse/FLINK-13535 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.8.1, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > > During startup of a transactional Kafka producer from previous state, we > recover in two steps: > # in {{TwoPhaseCommitSinkFunction}}, we commit pending commit-transactions > and abort pending transactions and then call into > {{finishRecoveringContext()}} > # in {{FlinkKafkaProducer#finishRecoveringContext()}} we iterate over all > recovered transaction IDs and abort them. > This may lead to some transactions being worked on twice. Since this is quite > some expensive operation, we unnecessarily slow down the job startup but > could easily give {{finishRecoveringContext()}} a set of transactions that > {{TwoPhaseCommitSinkFunction}} already covered instead. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-13537) Changing Kafka producer pool size and scaling out may create overlapping transaction IDs
[ https://issues.apache.org/jira/browse/FLINK-13537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13537: Description: The Kafka producer's transaction IDs are only generated once when there was no previous state for that operator. In the case where we restore and increase parallelism (scale-out), some operators may not have previous state and create new IDs. Now, if we also reduce the poolSize, these new IDs may overlap with the old ones which should never happen! On a side note, the current scheme also relies on the fact, that the operator's list state distributes previous states during scale-out in a fashion that only the operators with the highest subtask indices do not get a previous state. This is somewhat "guaranteed" by {{OperatorStateStore#getListState()}} but I'm not sure whether we should actually rely on that there. was:The Kafka producer's transaction IDs are only generated once when there was no previous state for that operator. In the case where we restore and increase parallelism (scale-out), some operators may not have previous state and create new IDs. Now, if we also reduce the poolSize, these new IDs may overlap with the old ones which should never happen! > Changing Kafka producer pool size and scaling out may create overlapping > transaction IDs > > > Key: FLINK-13537 > URL: https://issues.apache.org/jira/browse/FLINK-13537 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.8.1, 1.9.0 >Reporter: Nico Kruber >Priority: Major > > The Kafka producer's transaction IDs are only generated once when there was > no previous state for that operator. In the case where we restore and > increase parallelism (scale-out), some operators may not have previous state > and create new IDs. Now, if we also reduce the poolSize, these new IDs may > overlap with the old ones which should never happen! > On a side note, the current scheme also relies on the fact, that the > operator's list state distributes previous states during scale-out in a > fashion that only the operators with the highest subtask indices do not get a > previous state. This is somewhat "guaranteed" by > {{OperatorStateStore#getListState()}} but I'm not sure whether we should > actually rely on that there. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13537) Changing Kafka producer pool size and scaling out may create overlapping transaction IDs
Nico Kruber created FLINK-13537: --- Summary: Changing Kafka producer pool size and scaling out may create overlapping transaction IDs Key: FLINK-13537 URL: https://issues.apache.org/jira/browse/FLINK-13537 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.8.1, 1.9.0 Reporter: Nico Kruber The Kafka producer's transaction IDs are only generated once when there was no previous state for that operator. In the case where we restore and increase parallelism (scale-out), some operators may not have previous state and create new IDs. Now, if we also reduce the poolSize, these new IDs may overlap with the old ones which should never happen! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13535) Do not abort transactions twice during KafkaProducer startup
Nico Kruber created FLINK-13535: --- Summary: Do not abort transactions twice during KafkaProducer startup Key: FLINK-13535 URL: https://issues.apache.org/jira/browse/FLINK-13535 Project: Flink Issue Type: Improvement Components: Connectors / Kafka Affects Versions: 1.8.1, 1.9.0 Reporter: Nico Kruber Assignee: Nico Kruber During startup of a transactional Kafka producer from previous state, we recover in two steps: # in {{TwoPhaseCommitSinkFunction}}, we commit pending commit-transactions and abort pending transactions and then call into {{finishRecoveringContext()}} # in {{FlinkKafkaProducer#finishRecoveringContext()}} we iterate over all recovered transaction IDs and abort them This may lead to some transactions being worked on twice. Since this is quite some expensive operation, we unnecessarily slow down the job startup but could easily give {{finishRecoveringContext()}} a set of transactions that {{TwoPhaseCommitSinkFunction}} already covered instead. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (FLINK-13517) Restructure Hive Catalog documentation
[ https://issues.apache.org/jira/browse/FLINK-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-13517: --- Assignee: Seth Wiesman > Restructure Hive Catalog documentation > -- > > Key: FLINK-13517 > URL: https://issues.apache.org/jira/browse/FLINK-13517 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive, Documentation >Reporter: Seth Wiesman >Assignee: Seth Wiesman >Priority: Major > > Hive documentation is currently spread across a number of pages and > fragmented. In particular: > 1) An example was added to getting-started/examples, however, this section is > being removed > 2) There is a dedicated page on hive integration but also a lot of hive > specific information is on the catalog page > We should > 1) Inline the example into the hive integration page > 2) Move the hive specific information on catalogs.md to hive_integration.md > 3) Make catalogs.md be just about catalogs in general and link to the hive > integration. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13498) Reduce Kafka producer startup time by aborting transactions in parallel
Nico Kruber created FLINK-13498: --- Summary: Reduce Kafka producer startup time by aborting transactions in parallel Key: FLINK-13498 URL: https://issues.apache.org/jira/browse/FLINK-13498 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.8.1, 1.9.0 Reporter: Nico Kruber Assignee: Nico Kruber When a Flink job with a Kafka producer starts up without previous state, it currently starts 5 * kafkaPoolSize number of Kafka producers (per sink instance) to abort potentially existing transactions from a first run without a completed snapshot. Apparently, this is quite slow and it is also done sequentially. Until there is a better way of aborting these transactions with Kafka, we could do this in parallel quite easily and at least make use of lingering CPU resources. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (FLINK-12747) Getting Started - Table API Example Walkthrough
[ https://issues.apache.org/jira/browse/FLINK-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-12747. --- > Getting Started - Table API Example Walkthrough > --- > > Key: FLINK-12747 > URL: https://issues.apache.org/jira/browse/FLINK-12747 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: Konstantin Knauf >Assignee: Seth Wiesman >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The planned structure for the new Getting Started Guide is > * Flink Overview (~ two pages) > * Project Setup > ** Java > ** Scala > ** Python > * Quickstarts > ** Example Walkthrough - Table API / SQL > ** Example Walkthrough - DataStream API > * Docker Playgrounds > ** Flink Cluster Playground > ** Flink Interactive SQL Playground > This tickets adds the Example Walkthrough for the Table API, which should > follow the same structure as the DataStream Example (FLINK-12746), which > needs to be completed first. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-12747) Getting Started - Table API Example Walkthrough
[ https://issues.apache.org/jira/browse/FLINK-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-12747: Fix Version/s: (was: 1.10) 1.10.0 > Getting Started - Table API Example Walkthrough > --- > > Key: FLINK-12747 > URL: https://issues.apache.org/jira/browse/FLINK-12747 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: Konstantin Knauf >Assignee: Seth Wiesman >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The planned structure for the new Getting Started Guide is > * Flink Overview (~ two pages) > * Project Setup > ** Java > ** Scala > ** Python > * Quickstarts > ** Example Walkthrough - Table API / SQL > ** Example Walkthrough - DataStream API > * Docker Playgrounds > ** Flink Cluster Playground > ** Flink Interactive SQL Playground > This tickets adds the Example Walkthrough for the Table API, which should > follow the same structure as the DataStream Example (FLINK-12746), which > needs to be completed first. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-12171) The network buffer memory size should not be checked against the heap size on the TM side
[ https://issues.apache.org/jira/browse/FLINK-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-12171: Fix Version/s: (was: 1.10) 1.10.0 > The network buffer memory size should not be checked against the heap size on > the TM side > - > > Key: FLINK-12171 > URL: https://issues.apache.org/jira/browse/FLINK-12171 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.7.2, 1.8.0, 1.9.0 > Environment: Flink-1.7.2, and Flink-1.8 seems have not modified the > logic here. > >Reporter: Yun Gao >Assignee: Yun Gao >Priority: Major > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently when computing the network buffer memory size on the TM side in > _TaskManagerService#calculateNetworkBufferMemory_`(version 1.8 or 1.7) or > _NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master), > the computed network buffer memory size is checked to be less than > `maxJvmHeapMemory`. However, in TM side, _maxJvmHeapMemory_ stores the > maximum heap memory (namely -Xmx) . > > With the above process, when TM starts, -Xmx is computed in RM or in > _taskmanager.sh_ with (container memory - network buffer memory - managed > memory), thus the above checking implies that the heap memory of the TM must > be larger than the network memory, which seems to be not necessary. > > This may cause TM to use more memory than expected. For example, for a job > who has a large network throughput, uses may configure network memory to 2G. > However, if users want to assign 1G to heap memory, the TM will fail to > start, and user has to allocate at least 2G heap memory (in other words, 4G > in total for the TM instead of 3G) to make the TM runnable. This may cause > resource inefficiency. > > Therefore, I think the network buffer memory size also need to be checked > against the total memory instead of the heap memory on the TM side: > # Checks that networkBufFraction < 1.0. > # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)). > # Compare the network buffer memory with the total memory. > This checking is also consistent with the similar one done on the RM side. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (FLINK-12171) The network buffer memory size should not be checked against the heap size on the TM side
[ https://issues.apache.org/jira/browse/FLINK-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-12171. --- Resolution: Fixed Fix Version/s: 1.10 fixed via 8dec21f > The network buffer memory size should not be checked against the heap size on > the TM side > - > > Key: FLINK-12171 > URL: https://issues.apache.org/jira/browse/FLINK-12171 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.7.2, 1.8.0, 1.9.0 > Environment: Flink-1.7.2, and Flink-1.8 seems have not modified the > logic here. > >Reporter: Yun Gao >Assignee: Yun Gao >Priority: Major > Labels: pull-request-available > Fix For: 1.10 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently when computing the network buffer memory size on the TM side in > _TaskManagerService#calculateNetworkBufferMemory_`(version 1.8 or 1.7) or > _NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master), > the computed network buffer memory size is checked to be less than > `maxJvmHeapMemory`. However, in TM side, _maxJvmHeapMemory_ stores the > maximum heap memory (namely -Xmx) . > > With the above process, when TM starts, -Xmx is computed in RM or in > _taskmanager.sh_ with (container memory - network buffer memory - managed > memory), thus the above checking implies that the heap memory of the TM must > be larger than the network memory, which seems to be not necessary. > > This may cause TM to use more memory than expected. For example, for a job > who has a large network throughput, uses may configure network memory to 2G. > However, if users want to assign 1G to heap memory, the TM will fail to > start, and user has to allocate at least 2G heap memory (in other words, 4G > in total for the TM instead of 3G) to make the TM runnable. This may cause > resource inefficiency. > > Therefore, I think the network buffer memory size also need to be checked > against the total memory instead of the heap memory on the TM side: > # Checks that networkBufFraction < 1.0. > # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)). > # Compare the network buffer memory with the total memory. > This checking is also consistent with the similar one done on the RM side. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (FLINK-12747) Getting Started - Table API Example Walkthrough
[ https://issues.apache.org/jira/browse/FLINK-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber resolved FLINK-12747. - Resolution: Fixed Fix Version/s: 1.10 fixed via f4943dd > Getting Started - Table API Example Walkthrough > --- > > Key: FLINK-12747 > URL: https://issues.apache.org/jira/browse/FLINK-12747 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: Konstantin Knauf >Assignee: Seth Wiesman >Priority: Major > Labels: pull-request-available > Fix For: 1.10 > > Time Spent: 20m > Remaining Estimate: 0h > > The planned structure for the new Getting Started Guide is > * Flink Overview (~ two pages) > * Project Setup > ** Java > ** Scala > ** Python > * Quickstarts > ** Example Walkthrough - Table API / SQL > ** Example Walkthrough - DataStream API > * Docker Playgrounds > ** Flink Cluster Playground > ** Flink Interactive SQL Playground > This tickets adds the Example Walkthrough for the Table API, which should > follow the same structure as the DataStream Example (FLINK-12746), which > needs to be completed first. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (FLINK-13417) Bump Zookeeper to 3.5.5
[ https://issues.apache.org/jira/browse/FLINK-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895170#comment-16895170 ] Nico Kruber commented on FLINK-13417: - FYI: since 3.5.5 is the first stable version in the 3.5.x series[1], we should actually take this, not any older 3.5.x [1] https://zookeeper.apache.org/releases.html > Bump Zookeeper to 3.5.5 > --- > > Key: FLINK-13417 > URL: https://issues.apache.org/jira/browse/FLINK-13417 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.9.0 >Reporter: Konstantin Knauf >Priority: Major > > User might want to secure their Zookeeper connection via SSL. > This requires a Zookeeper version >= 3.5.1. We might as well try to bump it > to 3.5.5, which is the latest version. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (FLINK-12741) Update docs about Kafka producer fault tolerance guarantees
[ https://issues.apache.org/jira/browse/FLINK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-12741. --- Resolution: Fixed Fix Version/s: 1.8.2 1.7.3 merged for - 1.7: 56c3e7cd653e4cb2ad0a76ca317aa9fa1d564dc2 - 1.8: 91d036f794cfd96a3c1da445d5172690054aee2f > Update docs about Kafka producer fault tolerance guarantees > --- > > Key: FLINK-12741 > URL: https://issues.apache.org/jira/browse/FLINK-12741 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.9.0 >Reporter: Paul Lin >Assignee: Paul Lin >Priority: Trivial > Labels: pull-request-available > Fix For: 1.7.3, 1.8.2, 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Since Flink 1.4.0, we provide exactly-once semantic on Kafka 0.11+, but the > document is still not updated. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Reopened] (FLINK-12741) Update docs about Kafka producer fault tolerance guarantees
[ https://issues.apache.org/jira/browse/FLINK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reopened FLINK-12741: - > Update docs about Kafka producer fault tolerance guarantees > --- > > Key: FLINK-12741 > URL: https://issues.apache.org/jira/browse/FLINK-12741 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.9.0 >Reporter: Paul Lin >Assignee: Paul Lin >Priority: Trivial > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Since Flink 1.4.0, we provide exactly-once semantic on Kafka 0.11+, but the > document is still not updated. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (FLINK-13245) Network stack is leaking files
[ https://issues.apache.org/jira/browse/FLINK-13245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888982#comment-16888982 ] Nico Kruber commented on FLINK-13245: - I agree with [~zjwang] - changing the semantics should be tackled separately, not necessarily as part of this bug fix. I'll see when I have time to look at the PR so we can get this merged > Network stack is leaking files > -- > > Key: FLINK-13245 > URL: https://issues.apache.org/jira/browse/FLINK-13245 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.9.0 >Reporter: Chesnay Schepler >Assignee: zhijiang >Priority: Blocker > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 10m > Remaining Estimate: 0h > > There's file leak in the network stack / shuffle service. > When running the {{SlotCountExceedingParallelismTest}} on Windows a large > number of {{.channel}} files continue to reside in a > {{flink-netty-shuffle-XXX}} directory. > From what I've gathered so far these files are still being used by a > {{BoundedBlockingSubpartition}}. The cleanup logic in this class uses > ref-counting to ensure we don't release data while a reader is still present. > However, at the end of the job this count has not reached 0, and thus nothing > is being released. > The same issue is also present on the {{ResultPartition}} level; the > {{ReleaseOnConsumptionResultPartition}} also are being released while the > ref-count is greater than 0. > Overall it appears like there's some issue with the notifications for > partitions being consumed. > It is feasible that this issue has recently caused issues on Travis where the > build were failing due to a lack of disk space. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3
[ https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-8801. -- Resolution: Fixed > S3's eventual consistent read-after-write may fail yarn deployment of > resources to S3 > - > > Key: FLINK-8801 > URL: https://issues.apache.org/jira/browse/FLINK-8801 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / Coordination >Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available > Fix For: 1.9.0, 1.4.3, 1.5.0 > > Time Spent: 50m > Remaining Estimate: 0h > > According to > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel: > {quote} > Amazon S3 provides read-after-write consistency for PUTS of new objects in > your S3 bucket in all regions with one caveat. The caveat is that if you make > a HEAD or GET request to the key name (to find if the object exists) before > creating the object, Amazon S3 provides eventual consistency for > read-after-write. > {quote} > Some S3 file system implementations may actually execute such a request for > the about-to-write object and thus the read-after-write is only eventually > consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently > relies on a consistent read-after-write since it accesses the remote resource > to get file size and modification timestamp. Since there we have access to > the local resource, we can use the data from there instead and circumvent the > problem. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3
[ https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-8801: --- Fix Version/s: (was: 1.10.0) > S3's eventual consistent read-after-write may fail yarn deployment of > resources to S3 > - > > Key: FLINK-8801 > URL: https://issues.apache.org/jira/browse/FLINK-8801 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / Coordination >Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available > Fix For: 1.4.3, 1.5.0, 1.9.0 > > Time Spent: 50m > Remaining Estimate: 0h > > According to > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel: > {quote} > Amazon S3 provides read-after-write consistency for PUTS of new objects in > your S3 bucket in all regions with one caveat. The caveat is that if you make > a HEAD or GET request to the key name (to find if the object exists) before > creating the object, Amazon S3 provides eventual consistency for > read-after-write. > {quote} > Some S3 file system implementations may actually execute such a request for > the about-to-write object and thus the read-after-write is only eventually > consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently > relies on a consistent read-after-write since it accesses the remote resource > to get file size and modification timestamp. Since there we have access to > the local resource, we can use the data from there instead and circumvent the > problem. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Reopened] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3
[ https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reopened FLINK-8801: > S3's eventual consistent read-after-write may fail yarn deployment of > resources to S3 > - > > Key: FLINK-8801 > URL: https://issues.apache.org/jira/browse/FLINK-8801 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / Coordination >Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available > Fix For: 1.4.3, 1.5.0, 1.9.0, 1.10.0 > > Time Spent: 50m > Remaining Estimate: 0h > > According to > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel: > {quote} > Amazon S3 provides read-after-write consistency for PUTS of new objects in > your S3 bucket in all regions with one caveat. The caveat is that if you make > a HEAD or GET request to the key name (to find if the object exists) before > creating the object, Amazon S3 provides eventual consistency for > read-after-write. > {quote} > Some S3 file system implementations may actually execute such a request for > the about-to-write object and thus the read-after-write is only eventually > consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently > relies on a consistent read-after-write since it accesses the remote resource > to get file size and modification timestamp. Since there we have access to > the local resource, we can use the data from there instead and circumvent the > problem. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3
[ https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-8801. -- Resolution: Fixed Fix Version/s: 1.9.0 also merged to release-1.9 via b56234ce4e > S3's eventual consistent read-after-write may fail yarn deployment of > resources to S3 > - > > Key: FLINK-8801 > URL: https://issues.apache.org/jira/browse/FLINK-8801 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / Coordination >Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available > Fix For: 1.9.0, 1.10.0, 1.4.3, 1.5.0 > > Time Spent: 50m > Remaining Estimate: 0h > > According to > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel: > {quote} > Amazon S3 provides read-after-write consistency for PUTS of new objects in > your S3 bucket in all regions with one caveat. The caveat is that if you make > a HEAD or GET request to the key name (to find if the object exists) before > creating the object, Amazon S3 provides eventual consistency for > read-after-write. > {quote} > Some S3 file system implementations may actually execute such a request for > the about-to-write object and thus the read-after-write is only eventually > consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently > relies on a consistent read-after-write since it accesses the remote resource > to get file size and modification timestamp. Since there we have access to > the local resource, we can use the data from there instead and circumvent the > problem. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Reopened] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3
[ https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reopened FLINK-8801: > S3's eventual consistent read-after-write may fail yarn deployment of > resources to S3 > - > > Key: FLINK-8801 > URL: https://issues.apache.org/jira/browse/FLINK-8801 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / Coordination >Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available > Fix For: 1.4.3, 1.5.0, 1.10.0 > > Time Spent: 50m > Remaining Estimate: 0h > > According to > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel: > {quote} > Amazon S3 provides read-after-write consistency for PUTS of new objects in > your S3 bucket in all regions with one caveat. The caveat is that if you make > a HEAD or GET request to the key name (to find if the object exists) before > creating the object, Amazon S3 provides eventual consistency for > read-after-write. > {quote} > Some S3 file system implementations may actually execute such a request for > the about-to-write object and thus the read-after-write is only eventually > consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently > relies on a consistent read-after-write since it accesses the remote resource > to get file size and modification timestamp. Since there we have access to > the local resource, we can use the data from there instead and circumvent the > problem. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3
[ https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber resolved FLINK-8801. Resolution: Fixed Fix Version/s: 1.10.0 merged to master via 770a404 > S3's eventual consistent read-after-write may fail yarn deployment of > resources to S3 > - > > Key: FLINK-8801 > URL: https://issues.apache.org/jira/browse/FLINK-8801 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / Coordination >Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available > Fix For: 1.10.0, 1.4.3, 1.5.0 > > Time Spent: 50m > Remaining Estimate: 0h > > According to > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel: > {quote} > Amazon S3 provides read-after-write consistency for PUTS of new objects in > your S3 bucket in all regions with one caveat. The caveat is that if you make > a HEAD or GET request to the key name (to find if the object exists) before > creating the object, Amazon S3 provides eventual consistency for > read-after-write. > {quote} > Some S3 file system implementations may actually execute such a request for > the about-to-write object and thus the read-after-write is only eventually > consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently > relies on a consistent read-after-write since it accesses the remote resource > to get file size and modification timestamp. Since there we have access to > the local resource, we can use the data from there instead and circumvent the > problem. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Reopened] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3
[ https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reopened FLINK-8801: > S3's eventual consistent read-after-write may fail yarn deployment of > resources to S3 > - > > Key: FLINK-8801 > URL: https://issues.apache.org/jira/browse/FLINK-8801 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / Coordination >Affects Versions: 1.4.0, 1.5.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available > Fix For: 1.4.3, 1.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > According to > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel: > {quote} > Amazon S3 provides read-after-write consistency for PUTS of new objects in > your S3 bucket in all regions with one caveat. The caveat is that if you make > a HEAD or GET request to the key name (to find if the object exists) before > creating the object, Amazon S3 provides eventual consistency for > read-after-write. > {quote} > Some S3 file system implementations may actually execute such a request for > the about-to-write object and thus the read-after-write is only eventually > consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently > relies on a consistent read-after-write since it accesses the remote resource > to get file size and modification timestamp. Since there we have access to > the local resource, we can use the data from there instead and circumvent the > problem. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3
[ https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-8801: --- Affects Version/s: 1.9.0 1.6.4 1.7.2 1.8.1 > S3's eventual consistent read-after-write may fail yarn deployment of > resources to S3 > - > > Key: FLINK-8801 > URL: https://issues.apache.org/jira/browse/FLINK-8801 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, FileSystems, Runtime / Coordination >Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available > Fix For: 1.4.3, 1.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > According to > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel: > {quote} > Amazon S3 provides read-after-write consistency for PUTS of new objects in > your S3 bucket in all regions with one caveat. The caveat is that if you make > a HEAD or GET request to the key name (to find if the object exists) before > creating the object, Amazon S3 provides eventual consistency for > read-after-write. > {quote} > Some S3 file system implementations may actually execute such a request for > the about-to-write object and thus the read-after-write is only eventually > consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently > relies on a consistent read-after-write since it accesses the remote resource > to get file size and modification timestamp. Since there we have access to > the local resource, we can use the data from there instead and circumvent the > problem. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (FLINK-13173) Only run openSSL tests if desired
[ https://issues.apache.org/jira/browse/FLINK-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber closed FLINK-13173. --- Resolution: Fixed Fixed in master via 6d79968f04d549d37b3bcda086a1484e78f61ac3 > Only run openSSL tests if desired > - > > Key: FLINK-13173 > URL: https://issues.apache.org/jira/browse/FLINK-13173 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network, Tests, Travis >Affects Versions: 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Critical > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Rename {{flink.tests.force-openssl}} to {{flink.tests.with-openssl}} and only > run openSSL-based unit tests if this is set. This way, we avoid systems where > the bundled dynamic libraries do not work. Travis seems to run fine and will > have this property set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-13172) JVM crash with dynamic netty-tcnative wrapper to openSSL on some OS
[ https://issues.apache.org/jira/browse/FLINK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-13172: Description: The dynamically-linked wrapper library in {{flink-shaded-netty-tcnative-dynamic}} may not work on all systems, depending on how the system-provided openSSL library is built. As a result, when trying to run Flink with {{security.ssl.provider: OPENSSL}} or just running a test based on {{SSLUtilsTest}} (which checks for openSSL availability which is enough to trigger the error below), the JVM will crash, e.g. with - on SUSE-based systems: {code} /usr/lib64/jvm/java-openjdk/bin/java: relocation error: /tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_644115489043239307863.so: symbol TLSv1_2_server_method version OPENSSL_1.0.1 not defined in file libssl.so.1.0.0 with link time reference {code} - on Arch Linux: {code} /usr/lib/jvm/default/bin/java: relocation error: /tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_648476498532937980008.so: symbol SSLv3_method version OPENSSL_1.0.0 not defined in file libssl.so.1.0.0 with link time reference {code} Possible solutions: # build your own OS-dependent dynamically-linked {{netty-tcnative}} library and shade it in your own build of {{flink-shaded-netty-tcnative-dynamic}}, or # use {{flink-shaded-netty-tcnative-static}}: {code} git clone https://github.com/apache/flink-shaded.git cd flink-shaded mvn clean package -Pinclude-netty-tcnative-static -pl flink-shaded-netty-tcnative-static {code} # get your OS-dependent build into netty-tcnative as a special branch similar to what they currently do with Fedora-based systems was: The dynamically-linked wrapper library in {{flink-shaded-netty-tcnative-dynamic}} may not work on all systems, depending on how the system-provided openSSL library is built. As a result, when trying to run Flink with {{security.ssl.provider: OPENSSL}} or running a test based on {{SSLUtilsTest}} (which checks for openSSL availability), the JVM will crash, e.g. with - on SUSE-based systems: {code} /usr/lib64/jvm/java-openjdk/bin/java: relocation error: /tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_644115489043239307863.so: symbol TLSv1_2_server_method version OPENSSL_1.0.1 not defined in file libssl.so.1.0.0 with link time reference {code} - on Arch Linux: {code} /usr/lib/jvm/default/bin/java: relocation error: /tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_648476498532937980008.so: symbol SSLv3_method version OPENSSL_1.0.0 not defined in file libssl.so.1.0.0 with link time reference {code} Possible solutions: # build your own OS-dependent dynamically-linked {{netty-tcnative}} library and shade it in your own build of {{flink-shaded-netty-tcnative-dynamic}}, or # use {{flink-shaded-netty-tcnative-static}}: {code} git clone https://github.com/apache/flink-shaded.git cd flink-shaded mvn clean package -Pinclude-netty-tcnative-static -pl flink-shaded-netty-tcnative-static {code} # get your OS-dependent build into netty-tcnative as a special branch similar to what they currently do with Fedora-based systems > JVM crash with dynamic netty-tcnative wrapper to openSSL on some OS > --- > > Key: FLINK-13172 > URL: https://issues.apache.org/jira/browse/FLINK-13172 > Project: Flink > Issue Type: Bug > Components: Runtime / Network, Tests >Affects Versions: 1.9.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > > The dynamically-linked wrapper library in > {{flink-shaded-netty-tcnative-dynamic}} may not work on all systems, > depending on how the system-provided openSSL library is built. > As a result, when trying to run Flink with {{security.ssl.provider: OPENSSL}} > or just running a test based on {{SSLUtilsTest}} (which checks for openSSL > availability which is enough to trigger the error below), the JVM will crash, > e.g. with > - on SUSE-based systems: > {code} > /usr/lib64/jvm/java-openjdk/bin/java: relocation error: > /tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_644115489043239307863.so: > symbol TLSv1_2_server_method version OPENSSL_1.0.1 not defined in file > libssl.so.1.0.0 with link time reference > {code} > - on Arch Linux: > {code} > /usr/lib/jvm/default/bin/java: relocation error: > /tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_648476498532937980008.so: > symbol SSLv3_method version OPENSSL_1.0.0 not defined in file > libssl.so.1.0.0 with link time reference > {code} > Possible solutions: > # build your own OS-dependent dynamically-linked {{netty-tcnative}} library > and shade it in your own build of {{flink-shaded-netty-tcnative-dynamic}}, or > # use {{flink-shaded-netty-tcnative-static}}: > {code} > git clone