[jira] [Closed] (FLINK-14825) Rework state processor api documentation

2020-01-08 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-14825.
---
Fix Version/s: 1.9.2
   Resolution: Fixed

merged to
 * master: bd56224c3063fd23d508a4250e5698d4840fa488
 * release-1.10: b11b010aaacfc6e65d5c703d22e39e642121ce38
 * release-1.9: 24f5c5cd901332761d8deaa85f208f8ad2514b2b

> Rework state processor api documentation
> 
>
> Key: FLINK-14825
> URL: https://issues.apache.org/jira/browse/FLINK-14825
> Project: Flink
>  Issue Type: Improvement
>  Components: API / State Processor, Documentation
>Reporter: Seth Wiesman
>Assignee: Seth Wiesman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.2, 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The current version of the spa docs were rushed to meet the 1.9 release. We 
> should rewrite them to be more complete, include better examples, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-11789) Checkpoint directories are not cleaned up after job termination

2020-01-06 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008837#comment-17008837
 ] 

Nico Kruber commented on FLINK-11789:
-

One more thing to consider: I was using Azure Blob Storage for the checkpoints 
and savepoints directories and a job with no checkpoints. Now, I took some 
savepoints and still have the aforementioned directories lying around (while 
the job is running and afterwards as elaborated above). Since a savepoint 
essentially/almost does the same thing as a checkpoint, I do get the part of 
these directories being there, but without active checkpoints, they are 
strictly not even necessary in the first place.

However, I just wanted to point out that this also affects jobs with savepoints 
only.

> Checkpoint directories are not cleaned up after job termination
> ---
>
> Key: FLINK-11789
> URL: https://issues.apache.org/jira/browse/FLINK-11789
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.9.0
>Reporter: Till Rohrmann
>Priority: Major
>
> Flink currently does not clean up all checkpoint directories when a job 
> reaches a globally terminal state. Having configured the checkpoint directory 
> {{checkpoints}}, I observe that after cancelling the job {{JOB_ID}} there are 
> still
> {code}
> checkpoints/JOB_ID/shared
> checkpoints/JOB_ID/taskowned
> {code}
> I think it would be good if would delete {{checkpoints/JOB_ID}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15335) add-dependencies-for-IDEA not working anymore and dangerous in general

2019-12-19 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-15335:
---

 Summary: add-dependencies-for-IDEA not working anymore and 
dangerous in general
 Key: FLINK-15335
 URL: https://issues.apache.org/jira/browse/FLINK-15335
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Quickstarts
Affects Versions: 1.9.1, 1.8.3, 1.10.0
Reporter: Nico Kruber


The quickstart's {{add-dependencies-for-IDEA}} profile (for including 
{{flink-runtime}} and further dependencies that are usually {{provided}}) is 
not automatically enabled with IntelliJ anymore since the {{idea.version}} 
property is not set anymore (since a couple of versions of IntelliJ). My 
IntelliJ, for example, sets {{idea.version2019.3}} instead but even if the 
profile activation is changed to that, it is not enabled by default by IntelliJ.

There are two workarounds:
 * Tick {{Include dependencies with "Provided" scope}} in the run configuration 
(available in any newer IntelliJ version, probably since 2018) or
 * enable the profile manually - downside: if you create a jar inside IntelliJ 
via its own maven targets, the jar would contain the provided dependencies and 
make it unsuitable for submission into a Flink cluster.

I propose to remove the {{add-dependencies-for-IDEA}} profile for good (from 
the quickstarts) and adapt the documentation accordingly, e.g. 
[https://ci.apache.org/projects/flink/flink-docs-stable/dev/projectsetup/dependencies.html#setting-up-a-project-basic-dependencies]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15298) Wrong dependences in the DataStream API tutorial (the wiki-edits example)

2019-12-17 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-15298:

Affects Version/s: 1.7.2
   1.8.3

> Wrong dependences in the DataStream API tutorial (the wiki-edits example)
> -
>
> Key: FLINK-15298
> URL: https://issues.apache.org/jira/browse/FLINK-15298
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.7.2, 1.8.3, 1.9.0, 1.9.1
>Reporter: Jun Qin
>Priority: Major
>
> [The DataStream API Tutorial in Flink 1.9 | 
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/getting-started/tutorials/datastream_api.html]
>  mentioned the following dependences:
> {code:java}
> 
> 
> org.apache.flink
> flink-java
> ${flink.version}
> 
> 
> org.apache.flink
> flink-streaming-java_2.11
> ${flink.version}
> 
> 
> org.apache.flink
> flink-clients_2.11
> ${flink.version}
> 
> 
> org.apache.flink
> flink-connector-wikiedits_2.11
> ${flink.version}
> 
> 
> {code}
> There are two issues here:
> # {{flink-java}} and {{flink-streaming-java}} should be set to *provided* 
> scope
> # {{flink-client}} is not needed. If {{flink-client}} is added into *compile* 
> scope, {{flink-runtime}} will be added implicitly



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15298) Wrong dependences in the DataStream API tutorial (the wiki-edits example)

2019-12-17 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-15298:

Affects Version/s: (was: 1.9.0)

> Wrong dependences in the DataStream API tutorial (the wiki-edits example)
> -
>
> Key: FLINK-15298
> URL: https://issues.apache.org/jira/browse/FLINK-15298
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.7.2, 1.8.3, 1.9.1
>Reporter: Jun Qin
>Priority: Major
>
> [The DataStream API Tutorial in Flink 1.9 | 
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/getting-started/tutorials/datastream_api.html]
>  mentioned the following dependences:
> {code:java}
> 
> 
> org.apache.flink
> flink-java
> ${flink.version}
> 
> 
> org.apache.flink
> flink-streaming-java_2.11
> ${flink.version}
> 
> 
> org.apache.flink
> flink-clients_2.11
> ${flink.version}
> 
> 
> org.apache.flink
> flink-connector-wikiedits_2.11
> ${flink.version}
> 
> 
> {code}
> There are two issues here:
> # {{flink-java}} and {{flink-streaming-java}} should be set to *provided* 
> scope
> # {{flink-client}} is not needed. If {{flink-client}} is added into *compile* 
> scope, {{flink-runtime}} will be added implicitly



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15010) Temp directories flink-netty-shuffle-* are not cleaned up

2019-12-17 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998148#comment-16998148
 ] 

Nico Kruber commented on FLINK-15010:
-

I used {{start-cluster.sh}} and added {{localhost}} twice into {{conf/slaves}}

> Temp directories flink-netty-shuffle-* are not cleaned up
> -
>
> Key: FLINK-15010
> URL: https://issues.apache.org/jira/browse/FLINK-15010
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.9.1
>Reporter: Nico Kruber
>Priority: Major
>
> Starting a Flink cluster with 2 TMs and stopping it again will leave 2 
> temporary directories (and not delete them): flink-netty-shuffle-



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-14942) State Processing API: add an option to make deep copy

2019-12-13 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reassigned FLINK-14942:
---

Assignee: Jun Qin

> State Processing API: add an option to make deep copy
> -
>
> Key: FLINK-14942
> URL: https://issues.apache.org/jira/browse/FLINK-14942
> Project: Flink
>  Issue Type: Improvement
>  Components: API / State Processor
>Affects Versions: 1.11.0
>Reporter: Jun Qin
>Assignee: Jun Qin
>Priority: Major
>
> Current when a new savepoint is created based on a source savepoint, then 
> there are references in the new savepoint to the source savepoint. Here is 
> the [State Processing API 
> doc|https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/libs/state_processor_api.html]
>  says: 
> bq. Note: When basing a new savepoint on existing state, the state processor 
> api makes a shallow copy of the pointers to the existing operators. This 
> means that both savepoints share state and one cannot be deleted without 
> corrupting the other!
> This JIRA is to request an option to have a deep copy (instead of shallow 
> copy) such that the new savepoint is self-contained. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-15113) fs.azure.account.key not hidden from global configuration

2019-12-06 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber resolved FLINK-15113.
-
Resolution: Fixed

Fixed in
 * master: e9afee736acaaf8c74c66c52fa651d565cd48b10
 * release-1.9: 1b490927d391baaef4bce7421461a6eb2bd66254

> fs.azure.account.key not hidden from global configuration
> -
>
> Key: FLINK-15113
> URL: https://issues.apache.org/jira/browse/FLINK-15113
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.1
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.9.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For access to Azure's Blob Storage, you need to provide the (secret) key with
> {{fs.azure.account.key..core.windows.net}}
> This value, however, is not hidden from the global configuration which only 
> specifies configurations with keys containing "password" or "secret" as 
> sensitive.
> We should add {{fs.azure.account.key}} to that list as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15113) fs.azure.account.key not hidden from global configuration

2019-12-06 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-15113:

Fix Version/s: (was: 1.8.3)

> fs.azure.account.key not hidden from global configuration
> -
>
> Key: FLINK-15113
> URL: https://issues.apache.org/jira/browse/FLINK-15113
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.1
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.9.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For access to Azure's Blob Storage, you need to provide the (secret) key with
> {{fs.azure.account.key..core.windows.net}}
> This value, however, is not hidden from the global configuration which only 
> specifies configurations with keys containing "password" or "secret" as 
> sensitive.
> We should add {{fs.azure.account.key}} to that list as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15113) fs.azure.account.key not hidden from global configuration

2019-12-06 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-15113:

Fix Version/s: 1.9.2
   1.8.3

> fs.azure.account.key not hidden from global configuration
> -
>
> Key: FLINK-15113
> URL: https://issues.apache.org/jira/browse/FLINK-15113
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.1
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.8.3, 1.9.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For access to Azure's Blob Storage, you need to provide the (secret) key with
> {{fs.azure.account.key..core.windows.net}}
> This value, however, is not hidden from the global configuration which only 
> specifies configurations with keys containing "password" or "secret" as 
> sensitive.
> We should add {{fs.azure.account.key}} to that list as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15113) fs.azure.account.key not hidden from global configuration

2019-12-06 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-15113:

Description: 
For access to Azure's Blob Storage, you need to provide the (secret) key with

{{fs.azure.account.key..core.windows.net}}

This value, however, is not hidden from the global configuration which only 
specifies configurations with keys containing "password" or "secret" as 
sensitive.

We should add {{fs.azure.account.key}} to that list as well.

  was:
For access to Azrue's Blob Storage, you need to provide the (secret) key with

{{fs.azure.account.key..core.windows.net}}

This value, however, is not hidden from the global configuration which only 
specifies configurations with keys containing "password" or "secret" as 
sensitive.

We should add {{fs.azure.account.key}} to that list as well.


> fs.azure.account.key not hidden from global configuration
> -
>
> Key: FLINK-15113
> URL: https://issues.apache.org/jira/browse/FLINK-15113
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.1
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
> Fix For: 1.10.0
>
>
> For access to Azure's Blob Storage, you need to provide the (secret) key with
> {{fs.azure.account.key..core.windows.net}}
> This value, however, is not hidden from the global configuration which only 
> specifies configurations with keys containing "password" or "secret" as 
> sensitive.
> We should add {{fs.azure.account.key}} to that list as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15113) fs.azure.account.key not hidden from global configuration

2019-12-06 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-15113:
---

 Summary: fs.azure.account.key not hidden from global configuration
 Key: FLINK-15113
 URL: https://issues.apache.org/jira/browse/FLINK-15113
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Affects Versions: 1.9.1
Reporter: Nico Kruber
Assignee: Nico Kruber
 Fix For: 1.10.0


For access to Azrue's Blob Storage, you need to provide the (secret) key with

{{fs.azure.account.key..core.windows.net}}

This value, however, is not hidden from the global configuration which only 
specifies configurations with keys containing "password" or "secret" as 
sensitive.

We should add {{fs.azure.account.key}} to that list as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15068) Disable RocksDB's local LOG by default

2019-12-05 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-15068:
---

 Summary: Disable RocksDB's local LOG by default
 Key: FLINK-15068
 URL: https://issues.apache.org/jira/browse/FLINK-15068
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Affects Versions: 1.9.1, 1.8.2, 1.7.2
Reporter: Nico Kruber
Assignee: Nico Kruber
 Fix For: 1.10.0


With Flink's default settings for RocksDB, it will write a log file (not the 
WAL, but pure logging statements) into the data folder. Besides periodic 
statistics, it will log compaction attempts, new memtable creations, flushes, 
etc.

A few things to note about this practice:
 # *this LOG file is growing over time with no limit (!)*
 # the default logging level is INFO
 # the statistics in there may help looking into performance and/or disk space 
problems (but maybe you should be looking and monitoring metrics instead)
 # this file is not useful for debugging errors since it will be deleted along 
with the local dir when the TM goes down

With a custom \{{OptionsFactory}}, the user can change the behaviour like the 
following:
{code:java}
@Override
public DBOptions createDBOptions(DBOptions currentOptions) {
currentOptions = super.createDBOptions(currentOptions);

currentOptions.setKeepLogFileNum(10);
currentOptions.setInfoLogLevel(InfoLogLevel.WARN_LEVEL);
currentOptions.setStatsDumpPeriodSec(0);
currentOptions.setMaxLogFileSize(1024 * 1024); // 1 MB each

return currentOptions;
}{code}
However, the rotating logger does currently not work (it will not delete old 
log files - see [https://github.com/dataArtisans/frocksdb/pull/12]). Also, the 
user should not have to write his own {{OptionsFactory}} to get a sensible 
default.

To prevent this file from filling up the disk, I propose to change Flink's 
default RocksDB settings so that the LOG file is effectively disabled (nothing 
is written to it by default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15012) Checkpoint directory not cleaned up

2019-12-03 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986760#comment-16986760
 ] 

Nico Kruber commented on FLINK-15012:
-

Well, we do have a lot of temp directories that will be deleted with 
{{stop-cluster.sh}}, e.g. blobStorage or flink-io.

However, the checkpoint directory may be special because it is shared between 
the JobManager and the TaskManager processes. Even if the JobManager cleans 
this up, some TaskManager could still be writing to it in case a checkpoint was 
concurrently being created. I did not try, but I am a bit concerned whether 
this may happen in a real cluster setup as well, for example in K8s where you 
may kill the Flink cluster (along with all running jobs) through K8s. Since we 
claim that the checkpoint lifecycle is managed by Flink, it should actually 
always do the cleanup*

 

Looking at the code you linked for ZooKeeperCompletedCheckpointStore as well as 
how StandaloneCompletedCheckpointStore implement their {{shutdown() }}method, I 
am also wondering why they only clean up completed checkpoints. Shouldn't they 
also clean up in-process checkpoints (if possible)?

 

* There may be some strings attached but then they would need to be documented 
so that DevOps may account for that and eventually do a manual cleanup (if the 
checkpoint path lets them identify what to delete).

> Checkpoint directory not cleaned up
> ---
>
> Key: FLINK-15012
> URL: https://issues.apache.org/jira/browse/FLINK-15012
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.9.1
>Reporter: Nico Kruber
>Priority: Major
>
> I started a Flink cluster with 2 TMs using {{start-cluster.sh}} and the 
> following config (in addition to the default {{flink-conf.yaml}})
> {code:java}
> state.checkpoints.dir: file:///path/to/checkpoints/
> state.backend: rocksdb {code}
> After submitting a jobwith checkpoints enabled (every 5s), checkpoints show 
> up, e.g.
> {code:java}
> bb969f842bbc0ecc3b41b7fbe23b047b/
> ├── chk-2
> │   ├── 238969e1-6949-4b12-98e7-1411c186527c
> │   ├── 2702b226-9cfc-4327-979d-e5508ab2e3d5
> │   ├── 4c51cb24-6f71-4d20-9d4c-65ed6e826949
> │   ├── e706d574-c5b2-467a-8640-1885ca252e80
> │   └── _metadata
> ├── shared
> └── taskowned {code}
> If I shut down the cluster via {{stop-cluster.sh}}, these files will remain 
> on disk and not be cleaned up.
> In contrast, if I cancel the job, at least {{chk-2}} will be deleted, but 
> still leaving the (empty) directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-15011) RocksDB temp directory not cleaned up

2019-12-03 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-15011.
---
Resolution: Duplicate

> RocksDB temp directory not cleaned up
> -
>
> Key: FLINK-15011
> URL: https://issues.apache.org/jira/browse/FLINK-15011
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.9.1
>Reporter: Nico Kruber
>Priority: Major
>
> When starting a Flink cluster with 2 TMs, then starting a job with RocksDB 
> with
> {code:java}
> state.backend: rocksdb {code}
> it will create temp directories {{rocksdb-lib-}} where it extracts the 
> native libraries to. After shutting down the Flink cluster, these directories 
> remain (but their contents are cleaned up at least).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14378) Cleanup rocksDB lib folder if fail to load library

2019-12-03 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986741#comment-16986741
 ] 

Nico Kruber commented on FLINK-14378:
-

I believe a proper cleanup should cover both scenarios and a fix for this one 
probably also fixes the other issue. I'm closing FLINK-15011 as a duplicate.

 

Just to clarify here: we should also cleanup the {{rocksdb-lib-}} 
directory upon graceful shutdown.

> Cleanup rocksDB lib folder if fail to load library
> --
>
> Key: FLINK-14378
> URL: https://issues.apache.org/jira/browse/FLINK-14378
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Major
>
> This improvement is inspired due to some of our machines need some time to 
> load the rocksDB library. When some other unrecoverable exceptions continue 
> to happen and the process to load library would be interrupted which cause 
> the {{rocksdb-lib}} folder created but not cleaned up. As the job continues 
> to failover, the {{rocksdb-lib}} folder would be created more and more. We 
> even come across that machine was running out of inodes!
> Details could refer to current 
> [implementation|https://github.com/apache/flink/blob/80b27a150026b7b5cb707bd9fa3e17f565bb8112/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBStateBackend.java#L860]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15012) Checkpoint directory not cleaned up

2019-12-02 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-15012:
---

 Summary: Checkpoint directory not cleaned up
 Key: FLINK-15012
 URL: https://issues.apache.org/jira/browse/FLINK-15012
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Affects Versions: 1.9.1
Reporter: Nico Kruber


I started a Flink cluster with 2 TMs using {{start-cluster.sh}} and the 
following config (in addition to the default {{flink-conf.yaml}})
{code:java}
state.checkpoints.dir: file:///path/to/checkpoints/
state.backend: rocksdb {code}
After submitting a jobwith checkpoints enabled (every 5s), checkpoints show up, 
e.g.
{code:java}
bb969f842bbc0ecc3b41b7fbe23b047b/
├── chk-2
│   ├── 238969e1-6949-4b12-98e7-1411c186527c
│   ├── 2702b226-9cfc-4327-979d-e5508ab2e3d5
│   ├── 4c51cb24-6f71-4d20-9d4c-65ed6e826949
│   ├── e706d574-c5b2-467a-8640-1885ca252e80
│   └── _metadata
├── shared
└── taskowned {code}
If I shut down the cluster via {{stop-cluster.sh}}, these files will remain on 
disk and not be cleaned up.

In contrast, if I cancel the job, at least {{chk-2}} will be deleted, but still 
leaving the (empty) directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15011) RocksDB temp directory not cleaned up

2019-12-02 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-15011:
---

 Summary: RocksDB temp directory not cleaned up
 Key: FLINK-15011
 URL: https://issues.apache.org/jira/browse/FLINK-15011
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Affects Versions: 1.9.1
Reporter: Nico Kruber


When starting a Flink cluster with 2 TMs, then starting a job with RocksDB with
{code:java}
state.backend: rocksdb {code}
it will create temp directories {{rocksdb-lib-}} where it extracts the 
native libraries to. After shutting down the Flink cluster, these directories 
remain (but their contents are cleaned up at least).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15010) Temp directories flink-netty-shuffle-* are not cleaned up

2019-12-02 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-15010:
---

 Summary: Temp directories flink-netty-shuffle-* are not cleaned up
 Key: FLINK-15010
 URL: https://issues.apache.org/jira/browse/FLINK-15010
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Affects Versions: 1.9.1
Reporter: Nico Kruber


Starting a Flink cluster with 2 TMs and stopping it again will leave 2 
temporary directories (and not delete them): flink-netty-shuffle-



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-14890) TestHarness for KeyedBroadcastProcessFunction

2019-11-21 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reassigned FLINK-14890:
---

Assignee: Alexander Fedulov

> TestHarness for KeyedBroadcastProcessFunction
> -
>
> Key: FLINK-14890
> URL: https://issues.apache.org/jira/browse/FLINK-14890
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.9.1
>Reporter: Jun Qin
>Assignee: Alexander Fedulov
>Priority: Minor
>
> To test {{KeyedCoProcessFunction}}, one can use {{KeyedCoProcessOperator}} 
> and {{KeyedTwoInputStreamOperatorTestHarness}}, to test 
> {{KeyedBroadcastProcessFunction}}, I see {{CoBroadcastWithKeyedOperator}}, 
> but the TestHarness class is missing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-14825) Rework state processor api documentation

2019-11-15 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reassigned FLINK-14825:
---

Assignee: Seth Wiesman

> Rework state processor api documentation
> 
>
> Key: FLINK-14825
> URL: https://issues.apache.org/jira/browse/FLINK-14825
> Project: Flink
>  Issue Type: Improvement
>  Components: API / State Processor, Documentation
>Reporter: Seth Wiesman
>Assignee: Seth Wiesman
>Priority: Major
> Fix For: 1.10.0
>
>
> The current version of the spa docs were rushed to meet the 1.9 release. We 
> should rewrite them to be more complete, include better examples, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13727) Build docs with jekyll 4.0.0 (final)

2019-11-15 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13727:

Description: 
When Jekyll 4.0.0 is out, we should upgrade to this final version and 
discontinue using the beta.

When we make this final, we could also follow these official recommendations:
{quote}
-
This version of Jekyll comes with some major changes.

Most notably:
  * Our `link` tag now comes with the `relative_url` filter incorporated into 
it.
You should no longer prepend `{{ site.baseurl }}` to `% link foo.md 
%`
For further details: https://github.com/jekyll/jekyll/pull/6727

  * Our `post_url` tag now comes with the `relative_url` filter incorporated 
into it.
You shouldn't prepend `{{ site.baseurl }}` to `% post_url 
2019-03-27-hello %`
For further details: https://github.com/jekyll/jekyll/pull/7589

  * Support for deprecated configuration options has been removed. We will no 
longer
output a warning and gracefully assign their values to the newer 
counterparts
internally.
-
{quote}

  was:
When Jekyll 4.0.0 is out, we should upgrade to this final version and 
discontinue using the beta.

When we make this final, we could also follow these official recommendations:
{quote}
-
This version of Jekyll comes with some major changes.

Most notably:
  * Our `link` tag now comes with the `relative_url` filter incorporated into 
it.
You should no longer prepend `{{ site.baseurl }}` to `{% link foo.md %}`
For further details: https://github.com/jekyll/jekyll/pull/6727

  * Our `post_url` tag now comes with the `relative_url` filter incorporated 
into it.
You shouldn't prepend `{{ site.baseurl }}` to `{% post_url 2019-03-27-hello 
%}`
For further details: https://github.com/jekyll/jekyll/pull/7589

  * Support for deprecated configuration options has been removed. We will no 
longer
output a warning and gracefully assign their values to the newer 
counterparts
internally.
-
{quote}


> Build docs with jekyll 4.0.0 (final)
> 
>
> Key: FLINK-13727
> URL: https://issues.apache.org/jira/browse/FLINK-13727
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When Jekyll 4.0.0 is out, we should upgrade to this final version and 
> discontinue using the beta.
> When we make this final, we could also follow these official recommendations:
> {quote}
> -
> This version of Jekyll comes with some major changes.
> Most notably:
>   * Our `link` tag now comes with the `relative_url` filter incorporated into 
> it.
> You should no longer prepend `{{ site.baseurl }}` to `% link foo.md 
> %`
> For further details: https://github.com/jekyll/jekyll/pull/6727
>   * Our `post_url` tag now comes with the `relative_url` filter incorporated 
> into it.
> You shouldn't prepend `{{ site.baseurl }}` to `% post_url 
> 2019-03-27-hello %`
> For further details: https://github.com/jekyll/jekyll/pull/7589
>   * Support for deprecated configuration options has been removed. We will no 
> longer
> output a warning and gracefully assign their values to the newer 
> counterparts
> internally.
> -
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13722) Speed up documentation generation

2019-11-14 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974484#comment-16974484
 ] 

Nico Kruber commented on FLINK-13722:
-

[~chesnay] Thanks a lot - if actually works well now and the new test setup was 
very nice.

> Speed up documentation generation
> -
>
> Key: FLINK-13722
> URL: https://issues.apache.org/jira/browse/FLINK-13722
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>
> Creating the documentation via {{build_docs.sh}} currently takes about 150s!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-13791) Speed up sidenav by using group_by

2019-11-14 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-13791.
---
Fix Version/s: 1.9.2
   1.8.3
   Resolution: Fixed

Fixed via:
- master: a8868dd2219468c4528011c85551a33b4fe0ee9b
- release-1.9: 4e86b3efc01811f46355533bba5cb980e4140b2e
- release-1.8: 0b43d8dc9b9dfb0b8cfe255ff95f05b55fd4f85a

> Speed up sidenav by using group_by
> --
>
> Key: FLINK-13791
> URL: https://issues.apache.org/jira/browse/FLINK-13791
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.8.3, 1.9.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{_includes/sidenav.html}} parses through {{pages_by_language}} over and over 
> again trying to find children when building the (recursive) side navigation. 
> We could do this once with a {{group_by}} instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-13726) Build docs with jekyll 4.0.0.pre.beta1

2019-11-14 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-13726.
---
Fix Version/s: 1.9.2
   1.8.3
   Resolution: Fixed

Fixed via:
- master: cb7e9049491c139c008fa6755a38df7073dacec1
- release-1.9: 345abdff83420cc8f84231f32732352677eb8c91
- release-1.8: adbf065ec2660ee63e282f0b6831d41d77d75f46

> Build docs with jekyll 4.0.0.pre.beta1
> --
>
> Key: FLINK-13726
> URL: https://issues.apache.org/jira/browse/FLINK-13726
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.8.3, 1.9.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Jekyll 4 is way faster in generating the docs than jekyll 3 - probably due to 
> the newly introduced cache. Site generation time goes down by roughly a 
> factor of 2.5 even with the current beta version!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-13725) Use sassc for faster doc generation

2019-11-14 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-13725.
---
Fix Version/s: 1.9.2
   1.8.3
   Resolution: Fixed

Fixed via:
- master: 135472e7f52c91ecbef8d8a331372daf9c4464ef
- release-1.9: 129a21b74b4efce8897a31aa3bb1ea403f140b58
- release-1.8: ff954b568f9068d9fcb3f5c007dec296831ded0e

> Use sassc for faster doc generation
> ---
>
> Key: FLINK-13725
> URL: https://issues.apache.org/jira/browse/FLINK-13725
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.8.3, 1.9.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Jekyll requires {{sass}} but can optionally also use a C-based implementation 
> provided by {{sassc}}. Although we do not use sass directly, there may be 
> some indirect use inside jekyll. It doesn't seem to hurt to upgrade here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13729) Update website generation dependencies

2019-11-14 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974446#comment-16974446
 ] 

Nico Kruber commented on FLINK-13729:
-

Fixed via:
- master: 3f0f6f23f8a9559f706f4bc63d7806498ec4c128
- release-1.9: 8385159bb800cc8a17c0fad00db45856191d4090
- release-1.8: 70640a88b1f40aec7378a14ede3de1dde109f917

> Update website generation dependencies
> --
>
> Key: FLINK-13729
> URL: https://issues.apache.org/jira/browse/FLINK-13729
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The website generation dependencies are quite old. By upgrading some of them 
> we get improvements like a much nicer code highlighting and prepare for the 
> jekyll update of FLINK-13726 and FLINK-13727.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-13729) Update website generation dependencies

2019-11-14 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-13729.
---
Fix Version/s: 1.9.2
   1.8.3
   Resolution: Fixed

> Update website generation dependencies
> --
>
> Key: FLINK-13729
> URL: https://issues.apache.org/jira/browse/FLINK-13729
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.8.3, 1.9.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The website generation dependencies are quite old. By upgrading some of them 
> we get improvements like a much nicer code highlighting and prepare for the 
> jekyll update of FLINK-13726 and FLINK-13727.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13723) Use liquid-c for faster doc generation

2019-11-14 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13723:

Fix Version/s: 1.9.2
   1.8.3

> Use liquid-c for faster doc generation
> --
>
> Key: FLINK-13723
> URL: https://issues.apache.org/jira/browse/FLINK-13723
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.8.3, 1.9.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Jekyll requires {{liquid}} and only optionally uses {{liquid-c}} if 
> available. The latter uses natively-compiled code and reduces generation time 
> by ~5% for me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13723) Use liquid-c for faster doc generation

2019-11-14 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974441#comment-16974441
 ] 

Nico Kruber commented on FLINK-13723:
-

Fixed via:
- release-1.8: d9b0c4ba032000cea992d4a3eccf96b2cc6b8f43
- release-1.9: 4b1ef4dfe9155fb27bccf04fdc7bd9d4877bf93f

> Use liquid-c for faster doc generation
> --
>
> Key: FLINK-13723
> URL: https://issues.apache.org/jira/browse/FLINK-13723
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Jekyll requires {{liquid}} and only optionally uses {{liquid-c}} if 
> available. The latter uses natively-compiled code and reduces generation time 
> by ~5% for me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13724) Remove unnecessary whitespace from the docs' sidenav

2019-11-14 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13724:

Fix Version/s: (was: 1.9.1)
   1.9.2

> Remove unnecessary whitespace from the docs' sidenav
> 
>
> Key: FLINK-13724
> URL: https://issues.apache.org/jira/browse/FLINK-13724
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.8.3, 1.9.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The side navigation generates quite some white space that will end up in 
> every HTML page. Removing this reduces final page sizes and also improved 
> site generation speed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13724) Remove unnecessary whitespace from the docs' sidenav

2019-11-14 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13724:

Fix Version/s: 1.9.1
   1.8.3

> Remove unnecessary whitespace from the docs' sidenav
> 
>
> Key: FLINK-13724
> URL: https://issues.apache.org/jira/browse/FLINK-13724
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.9.1, 1.8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The side navigation generates quite some white space that will end up in 
> every HTML page. Removing this reduces final page sizes and also improved 
> site generation speed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13724) Remove unnecessary whitespace from the docs' sidenav

2019-11-14 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974379#comment-16974379
 ] 

Nico Kruber commented on FLINK-13724:
-

Fixed in
- release-1.9: 5bfbfc9d0ad1120da001f9911dca6834fb3a788c
- release-1.8: 6d55ababf05f24070053cafd177c9b69cabeff60

> Remove unnecessary whitespace from the docs' sidenav
> 
>
> Key: FLINK-13724
> URL: https://issues.apache.org/jira/browse/FLINK-13724
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The side navigation generates quite some white space that will end up in 
> every HTML page. Removing this reduces final page sizes and also improved 
> site generation speed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13728) Fix wrong closing tag order in sidenav

2019-11-14 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974378#comment-16974378
 ] 

Nico Kruber commented on FLINK-13728:
-

Fixed in 1.8.3 via bd6b2e2eb527392e7b6100089fd83c212e976705

> Fix wrong closing tag order in sidenav
> --
>
> Key: FLINK-13728
> URL: https://issues.apache.org/jira/browse/FLINK-13728
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.8.1, 1.9.0, 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.9.1, 1.8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The order of closing HTML tags in the sidenav is wrong: instead of 
> {{}} it should be {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13728) Fix wrong closing tag order in sidenav

2019-11-14 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13728:

Fix Version/s: 1.8.3

> Fix wrong closing tag order in sidenav
> --
>
> Key: FLINK-13728
> URL: https://issues.apache.org/jira/browse/FLINK-13728
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.8.1, 1.9.0, 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.9.1, 1.8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The order of closing HTML tags in the sidenav is wrong: instead of 
> {{}} it should be {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-14781) [ZH] clarify that a RocksDB dependency in pom.xml may not be needed

2019-11-14 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reassigned FLINK-14781:
---

Assignee: rdyang

> [ZH] clarify that a RocksDB dependency in pom.xml may not be needed
> ---
>
> Key: FLINK-14781
> URL: https://issues.apache.org/jira/browse/FLINK-14781
> Project: Flink
>  Issue Type: Bug
>  Components: chinese-translation, Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: rdyang
>Priority: Major
>
> The English version was clarified with respect when and how to add the maven 
> dependencies via 
> https://github.com/apache/flink/commit/d36ce5ff77fae2b01b8fbe8e5c15d610de8ed9f5.
>  The Chinese version still needs that update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14781) [ZH] clarify that a RocksDB dependency in pom.xml may not be needed

2019-11-14 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-14781:

Summary: [ZH] clarify that a RocksDB dependency in pom.xml may not be 
needed  (was: clarify that a RocksDB dependency in pom.xml may not be needed)

> [ZH] clarify that a RocksDB dependency in pom.xml may not be needed
> ---
>
> Key: FLINK-14781
> URL: https://issues.apache.org/jira/browse/FLINK-14781
> Project: Flink
>  Issue Type: Bug
>  Components: chinese-translation, Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Priority: Major
>
> The English version was clarified with respect when and how to add the maven 
> dependencies via 
> https://github.com/apache/flink/commit/d36ce5ff77fae2b01b8fbe8e5c15d610de8ed9f5.
>  The Chinese version still needs that update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14781) clarify that a RocksDB dependency in pom.xml may not be needed

2019-11-14 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-14781:
---

 Summary: clarify that a RocksDB dependency in pom.xml may not be 
needed
 Key: FLINK-14781
 URL: https://issues.apache.org/jira/browse/FLINK-14781
 Project: Flink
  Issue Type: Bug
  Components: chinese-translation, Documentation
Affects Versions: 1.10.0
Reporter: Nico Kruber


The English version was clarified with respect when and how to add the maven 
dependencies via 
https://github.com/apache/flink/commit/d36ce5ff77fae2b01b8fbe8e5c15d610de8ed9f5.
 The Chinese version still needs that update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14575) Wrong (parent-first) class loader during serialization while submitting jobs

2019-10-30 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-14575:
---

 Summary: Wrong (parent-first) class loader during serialization 
while submitting jobs
 Key: FLINK-14575
 URL: https://issues.apache.org/jira/browse/FLINK-14575
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission
Affects Versions: 1.9.1, 1.8.2
Reporter: Nico Kruber


When building the user code classloader for job submission, Flink uses a parent 
first class loader for serializing the ExecutionConfig which may lead to 
problems in the following case:

# have hadoop in the system class loader from lib/ (this also provides avro 
1.8.3)
# have a user jar with a newer avro, e.g. 1.9.1
# register an Avro class with the execution config, e.g. through 
{{registerPojoType}} (please ignore for a second that this is not needed)

During submission, a parent-first classloader will be used and thus, avro 1.8.3 
will be used which does not map the version in the user classloader that will 
be used for deserialization.

Exception during submission:

{code}
Caused by: java.io.InvalidClassException: 
org.apache.avro.specific.SpecificRecordBase; local class incompatible: stream 
classdesc serialVersionUID = 189988654766568477, local class serialVersionUID = 
-1463700717714793795
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1716)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1556)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at java.util.HashSet.readObject(HashSet.java:341)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:566)
at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:552)
at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:540)
at 
org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58)
at 
org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:278)
at 
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:83)
at 
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:37)
at 
org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:146)
... 10 more
{code}

The incriminating code is in
* Flink 1.8.0: 
{{org.apache.flink.client.program.JobWithJars#buildUserCodeClassLoader}}
* Flink master: {{org.apache.flink.client.ClientUtils#buildUserCodeClassLoader}}

Thanks [~chesnay] for looking into this with me. [~aljoscha] Do you know why we 
use parent-first there?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-7002) Partitioning broken if enum is used in compound key specified using field expression

2019-10-04 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-7002.
--
Resolution: Won't Fix

Actually, this is not a Flink issue, but an issue of enums in Java and their 
implementation of {{hashCode}} which relies on the enum instance's memory 
address and therefore may be different in each JVM.

You could instead use the enum's ordinal or its name in the key selector 
implementation.

Please also refer to this for some more info:
https://stackoverflow.com/questions/49140654/flink-error-key-group-is-not-in-keygrouprange

> Partitioning broken if enum is used in compound key specified using field 
> expression
> 
>
> Key: FLINK-7002
> URL: https://issues.apache.org/jira/browse/FLINK-7002
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: 1.2.0, 1.3.1
>Reporter: Sebastian Klemke
>Priority: Major
> Attachments: TestJob.java, WorkingTestJob.java, testdata.avro
>
>
> When groupBy() or keyBy() is used with multiple field expressions, at least 
> one of them being an enum type serialized using EnumTypeInfo, partitioning 
> seems random, resulting in incorrectly grouped/keyed output 
> datasets/datastreams.
> The attached Flink DataSet API jobs and the test dataset detail the issue: 
> Both jobs count (id, type) occurrences, TestJob uses field expressions to 
> group, WorkingTestJob uses a KeySelector function.
> Expected output for both is 6 records, with frequency value 100_000 each. If 
> you run in LocalEnvironment, results are in fact equivalent. But when run on 
> a cluster with 5 TaskManagers, only KeySelector function with String key 
> produces correct results whereas field expressions produce random, 
> non-repeatable, wrong results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-5334) outdated scala SBT quickstart example

2019-09-23 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-5334:
---
Component/s: Documentation

> outdated scala SBT quickstart example
> -
>
> Key: FLINK-5334
> URL: https://issues.apache.org/jira/browse/FLINK-5334
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Quickstarts
>Affects Versions: 1.7.0
>Reporter: Nico Kruber
>Priority: Major
>
> The scala quickstart set up via sbt-quickstart.sh or from the repository at 
> https://github.com/tillrohrmann/flink-project seems outdated compared to what 
> is set up with the maven archetype, e.g. Job.scala vs. BatchJob.scala and 
> StreamingJob.scala. This should probably be updated and also the hard-coded 
> example in sbt-quickstart.sh on the web page could be removed and download 
> the newest version instead as the mvn command does.
> see 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/scala_api_quickstart.html
>  for these two paths (SBT vs. Maven)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-5334) outdated scala SBT quickstart example

2019-09-23 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-5334:
---
Affects Version/s: 1.8.2
   1.9.0

> outdated scala SBT quickstart example
> -
>
> Key: FLINK-5334
> URL: https://issues.apache.org/jira/browse/FLINK-5334
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Quickstarts
>Affects Versions: 1.7.0, 1.8.2, 1.9.0
>Reporter: Nico Kruber
>Priority: Major
>
> The scala quickstart set up via sbt-quickstart.sh or from the repository at 
> https://github.com/tillrohrmann/flink-project seems outdated compared to what 
> is set up with the maven archetype, e.g. Job.scala vs. BatchJob.scala and 
> StreamingJob.scala. This should probably be updated and also the hard-coded 
> example in sbt-quickstart.sh on the web page could be removed and download 
> the newest version instead as the mvn command does.
> see 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/scala_api_quickstart.html
>  for these two paths (SBT vs. Maven)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-5334) outdated scala SBT quickstart example

2019-09-23 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936183#comment-16936183
 ] 

Nico Kruber commented on FLINK-5334:


Actually, the script asks for a Flink and Scala version but then does not take 
them into account when creating the example project.

> outdated scala SBT quickstart example
> -
>
> Key: FLINK-5334
> URL: https://issues.apache.org/jira/browse/FLINK-5334
> Project: Flink
>  Issue Type: Bug
>  Components: Quickstarts
>Affects Versions: 1.7.0
>Reporter: Nico Kruber
>Priority: Major
>
> The scala quickstart set up via sbt-quickstart.sh or from the repository at 
> https://github.com/tillrohrmann/flink-project seems outdated compared to what 
> is set up with the maven archetype, e.g. Job.scala vs. BatchJob.scala and 
> StreamingJob.scala. This should probably be updated and also the hard-coded 
> example in sbt-quickstart.sh on the web page could be removed and download 
> the newest version instead as the mvn command does.
> see 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/scala_api_quickstart.html
>  for these two paths (SBT vs. Maven)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14104) Bump Jackson to 2.9.9.3

2019-09-17 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-14104:
---

 Summary: Bump Jackson to 2.9.9.3
 Key: FLINK-14104
 URL: https://issues.apache.org/jira/browse/FLINK-14104
 Project: Flink
  Issue Type: Bug
  Components: BuildSystem / Shaded
Affects Versions: shaded-8.0, shaded-7.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Our current Jackson version (2.9.8) is vulnerable for at least this CVE:
https://nvd.nist.gov/vuln/detail/CVE-2019-14379

Bumping to 2.9.9.3 should solve it.
See https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.9



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13771) Support kqueue Netty transports (MacOS)

2019-09-05 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923797#comment-16923797
 ] 

Nico Kruber commented on FLINK-13771:
-

[~aitozi] I'm not working on this and I also do not know how much it is worth 
since Mac servers (running Flink, in particular) are not really wide-spread, 
afaik. However, the actual implementation overhead should be low.

I'll assign you to the issue and can have a look on the PR when you are done.

> Support kqueue Netty transports (MacOS)
> ---
>
> Key: FLINK-13771
> URL: https://issues.apache.org/jira/browse/FLINK-13771
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Nico Kruber
>Priority: Major
>
> It seems like Netty is now also supporting MacOS's native transport 
> {{kqueue}}:
> https://netty.io/wiki/native-transports.html#using-the-macosbsd-native-transport
> We should allow this via {{taskmanager.network.netty.transport}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (FLINK-13771) Support kqueue Netty transports (MacOS)

2019-09-05 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reassigned FLINK-13771:
---

Assignee: Aitozi

> Support kqueue Netty transports (MacOS)
> ---
>
> Key: FLINK-13771
> URL: https://issues.apache.org/jira/browse/FLINK-13771
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Nico Kruber
>Assignee: Aitozi
>Priority: Major
>
> It seems like Netty is now also supporting MacOS's native transport 
> {{kqueue}}:
> https://netty.io/wiki/native-transports.html#using-the-macosbsd-native-transport
> We should allow this via {{taskmanager.network.netty.transport}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-12122) Spread out tasks evenly across all available registered TaskManagers

2019-09-04 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922710#comment-16922710
 ] 

Nico Kruber commented on FLINK-12122:
-

[~anton_ryabtsev]  true, memory could become an issue in certain scenarios. 
However, I don't get the GC part: native memory for threads shouldn't be part 
of GC, network buffers are pre-allocated and off-heap, and the task's load is, 
in sum, the same, just more widely spread.

> Spread out tasks evenly across all available registered TaskManagers
> 
>
> Key: FLINK-12122
> URL: https://issues.apache.org/jira/browse/FLINK-12122
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Till Rohrmann
>Priority: Major
> Attachments: image-2019-05-21-12-28-29-538.png, 
> image-2019-05-21-13-02-50-251.png
>
>
> With Flip-6, we changed the default behaviour how slots are assigned to 
> {{TaskManages}}. Instead of evenly spreading it out over all registered 
> {{TaskManagers}}, we randomly pick slots from {{TaskManagers}} with a 
> tendency to first fill up a TM before using another one. This is a regression 
> wrt the pre Flip-6 code.
> I suggest to change the behaviour so that we try to evenly distribute slots 
> across all available {{TaskManagers}} by considering how many of their slots 
> are already allocated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-10177) Use network transport type AUTO by default

2019-08-22 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913127#comment-16913127
 ] 

Nico Kruber commented on FLINK-10177:
-

One thing to consider/test: if, for whatever reason one TM would end up with 
NIO and another with epoll, they should theoretically work together but this 
should be verified. After all, this is just the local channel listening 
implementation. 

On the other hand, most deployments should be homogeneous and therefore not end 
up in that scenario.

> Use network transport type AUTO by default
> --
>
> Key: FLINK-10177
> URL: https://issues.apache.org/jira/browse/FLINK-10177
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Configuration, Runtime / Network
>Affects Versions: 1.6.0, 1.7.0
>Reporter: Nico Kruber
>Assignee: boshu Zheng
>Priority: Major
>
> Now that the shading issue with the native library is fixed (FLINK-9463), 
> EPOLL should be available on (all?) Linux distributions and provide some 
> efficiency gain (if enabled). Therefore, 
> {{taskmanager.network.netty.transport}} should be set to {{auto}} by default. 
> If EPOLL is not available, it will automatically fall back to NIO which 
> currently is the default.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13770) Bump Netty to 4.1.39.Final

2019-08-22 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913124#comment-16913124
 ] 

Nico Kruber commented on FLINK-13770:
-

see https://issues.apache.org/jira/browse/FLINK-10177

> Bump Netty to 4.1.39.Final
> --
>
> Key: FLINK-13770
> URL: https://issues.apache.org/jira/browse/FLINK-13770
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I quickly went through all the changelogs for Netty 4.1.32 (which we
> currently use) to the latest Netty 4.1.39.Final. Below, you will find a
> list of bug fixes and performance improvements that may affect us. Nice
> changes we could benefit from, also for the Java > 8 efforts. The most
> important ones fixing leaks etc are #8921, #9167, #9274, #9394, and the
> various {{CompositeByteBuf}} fixes. The rest are mostly performance
> improvements.
> Since we are still early in the dev cycle for Flink 1.10, it would be
> nice to update now and verify that the new version works correctly.
> {code}
> Netty 4.1.33.Final
> - Fix ClassCastException and native crash when using kqueue transport
> (#8665)
> - Provide a way to cache the internal nioBuffer of the PooledByteBuffer
> to reduce GC (#8603)
> Netty 4.1.34.Final
> - Do not use GetPrimitiveArrayCritical(...) due multiple not-fixed bugs
> related to GCLocker (#8921)
> - Correctly monkey-patch id also in whe os / arch is used within library
> name (#8913)
> - Further reduce ensureAccessible() overhead (#8895)
> - Support using an Executor to offload blocking / long-running tasks
> when processing TLS / SSL via the SslHandler (#8847)
> - Minimize memory footprint for AbstractChannelHandlerContext for
> handlers that execute in the EventExecutor (#8786)
> - Fix three bugs in CompositeByteBuf (#8773)
> Netty 4.1.35.Final
> - Fix possible ByteBuf leak when CompositeByteBuf is resized (#8946)
> - Correctly produce ssl alert when certificate validation fails on the
> client-side when using native SSL implementation (#8949)
> Netty 4.1.37.Final
> - Don't filter out TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (#9274)
> - Try to mark child channel writable again once the parent channel
> becomes writable (#9254)
> - Properly debounce wakeups (#9191)
> - Don't read from timerfd and eventfd on each EventLoop tick (#9192)
> - Correctly detect that KeyManagerFactory is not supported when using
> OpenSSL 1.1.0+ (#9170)
> - Fix possible unsafe sharing of internal NIO buffer in CompositeByteBuf
> (#9169)
> - KQueueEventLoop won't unregister active channels reusing a file
> descriptor (#9149)
> - Prefer direct io buffers if direct buffers pooled (#9167)
> Netty 4.1.38.Final
> - Prevent ByteToMessageDecoder from overreading when !isAutoRead (#9252)
> - Correctly take length of ByteBufInputStream into account for
> readLine() / readByte() (#9310)
> - availableSharedCapacity will be slowly exhausted (#9394)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (FLINK-12983) Replace descriptive histogram's storage back-end

2019-08-21 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-12983.
---
Fix Version/s: 1.10.0
   Resolution: Fixed

merged to master via f57a615

> Replace descriptive histogram's storage back-end
> 
>
> Key: FLINK-12983
> URL: https://issues.apache.org/jira/browse/FLINK-12983
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{DescriptiveStatistics}} relies on their {{ResizableDoubleArray}} for 
> storing double values for their histograms. However, this is constantly 
> resizing an internal array and seems to have quite some overhead.
> Additionally, we're not using {{SynchronizedDescriptiveStatistics}} which, 
> according to its docs, we should. Currently, we seem to be somewhat safe 
> because {{ResizableDoubleArray}} has some synchronized parts but these are 
> scheduled to go away with commons.math version 4.
> Internal tests with the current implementation, one based on a linear array 
> of twice the histogram size (and moving values back to the start once the 
> window reaches the end), and one using a circular array (wrapping around with 
> flexible start position) has shown these numbers using the optimised code 
> from FLINK-10236, FLINK-12981, and FLINK-12982:
> # only adding values to the histogram
> {code}
> Benchmark   Mode  Cnt  Score
> Error   Units
> HistogramBenchmarks.dropwizardHistogramAdd thrpt   30   47985.359 ±
> 25.847  ops/ms
> HistogramBenchmarks.descriptiveHistogramAddthrpt   30   70158.792 ±   
> 276.858  ops/ms
> --- with FLINK-10236, FLINK-12981, and FLINK-12982 ---
> HistogramBenchmarks.descriptiveHistogramAddthrpt   30   75303.040 ±   
> 475.355  ops/ms
> HistogramBenchmarks.descrHistogramCircularAdd  thrpt   30  200906.902 ±   
> 384.483  ops/ms
> HistogramBenchmarks.descrHistogramLinearAddthrpt   30  189788.728 ±   
> 233.283  ops/ms
> {code}
> # after adding each value, also retrieving a common set of metrics:
> {code}
> Benchmark   Mode  Cnt  Score
> Error   Units
> HistogramBenchmarks.dropwizardHistogramthrpt   30 400.274 ± 
> 4.930  ops/ms
> HistogramBenchmarks.descriptiveHistogram   thrpt   30 124.533 ± 
> 1.060  ops/ms
> --- with FLINK-10236, FLINK-12981, and FLINK-12982 ---
> HistogramBenchmarks.descriptiveHistogram   thrpt   30 251.895 ± 
> 1.809  ops/ms
> HistogramBenchmarks.descrHistogramCircular thrpt   30 301.068 ± 
> 2.077  ops/ms
> HistogramBenchmarks.descrHistogramLinear   thrpt   30 234.050 ± 
> 5.485  ops/ms
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (FLINK-12982) Make DescriptiveStatisticsHistogramStatistics a true point-in-time snapshot

2019-08-21 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-12982.
---
Fix Version/s: 1.10.0
   Resolution: Fixed

merged in master via 4452be3

> Make DescriptiveStatisticsHistogramStatistics a true point-in-time snapshot
> ---
>
> Key: FLINK-12982
> URL: https://issues.apache.org/jira/browse/FLINK-12982
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Instead of redirecting {{DescriptiveStatisticsHistogramStatistics}} calls to 
> {{DescriptiveStatistics}}, it takes a point-in-time snapshot using an own
>   {{UnivariateStatistic}} implementation that
>  * calculates min, max, mean, and standard deviation in one go (as opposed to 
> four iterations over the values array!)
>  * caches pivots for the percentile calculation to speed up retrieval of 
> multiple percentiles/quartiles
> This is also similar to the semantics of our implementation using codahale's 
> {{DropWizard}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (FLINK-12981) Ignore NaN values in histogram's percentile implementation

2019-08-21 Thread Nico Kruber (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-12981.
---
Fix Version/s: 1.10.0
   Resolution: Fixed

merged into master via  e59b9d2

> Ignore NaN values in histogram's percentile implementation
> --
>
> Key: FLINK-12981
> URL: https://issues.apache.org/jira/browse/FLINK-12981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Histogram metrics use "long" values and therefore, there is no {{Double.NaN}} 
> in {{DescriptiveStatistics}}' data and there is no need to cleanse it while 
> working with it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13793) Build different language docs in parallel

2019-08-20 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-13793:
---

 Summary: Build different language docs in parallel
 Key: FLINK-13793
 URL: https://issues.apache.org/jira/browse/FLINK-13793
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Nico Kruber
Assignee: Nico Kruber


Unfortunately, jekyll is lacking parallel builds and thus not making use of 
unused resources. In the special case of building the documentation without 
serving it, we could build each language (en, zh) in a separate sub-process and 
thus get some parallelization.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13791) Speed up sidenav by using group_by

2019-08-19 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-13791:
---

 Summary: Speed up sidenav by using group_by
 Key: FLINK-13791
 URL: https://issues.apache.org/jira/browse/FLINK-13791
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Nico Kruber
Assignee: Nico Kruber


{{_includes/sidenav.html}} parses through {{pages_by_language}} over and over 
again trying to find children when building the (recursive) side navigation. We 
could do this once with a {{group_by}} instead.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (FLINK-12984) Only call Histogram#getStatistics() once per set of retrieved statistics

2019-08-15 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-12984.
---
   Resolution: Fixed
Fix Version/s: 1.10.0

fixed on master via d9f012746f5b8b36ebb416f70e9f5bac93538d5d

> Only call Histogram#getStatistics() once per set of retrieved statistics
> 
>
> Key: FLINK-12984
> URL: https://issues.apache.org/jira/browse/FLINK-12984
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In some occasions, {{Histogram#getStatistics()}} was called multiple times to 
> retrieve different statistics. However, at least the Dropwizard 
> implementation has some constant overhead per call and we should maybe rather 
> interpret this method as returning a point-in-time snapshot of the histogram 
> in order to get consistent values when querying them.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (FLINK-12987) DescriptiveStatisticsHistogram#getCount does not return the number of elements seen

2019-08-15 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-12987.
---
   Resolution: Fixed
Fix Version/s: 1.10.0

fixed on master via fd9ef60cc8448a5f4d1915973e168aad073d8e8d

> DescriptiveStatisticsHistogram#getCount does not return the number of 
> elements seen
> ---
>
> Key: FLINK-12987
> URL: https://issues.apache.org/jira/browse/FLINK-12987
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.6.4, 1.7.2, 1.8.0, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{DescriptiveStatisticsHistogram#getCount()}} returns the number of elements 
> in the current window and not the number of total elements seen over time. In 
> contrast, {{DropwizardHistogramWrapper}} does this correctly.
> We should unify the behaviour and add a unit test for it (there is no generic 
> histogram test yet).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-12987) DescriptiveStatisticsHistogram#getCount does not return the number of elements seen

2019-08-15 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-12987:

Affects Version/s: 1.9.0

> DescriptiveStatisticsHistogram#getCount does not return the number of 
> elements seen
> ---
>
> Key: FLINK-12987
> URL: https://issues.apache.org/jira/browse/FLINK-12987
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.6.4, 1.7.2, 1.8.0, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{DescriptiveStatisticsHistogram#getCount()}} returns the number of elements 
> in the current window and not the number of total elements seen over time. In 
> contrast, {{DropwizardHistogramWrapper}} does this correctly.
> We should unify the behaviour and add a unit test for it (there is no generic 
> histogram test yet).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (FLINK-13020) UT Failure: ChainLengthDecreaseTest

2019-08-15 Thread Nico Kruber (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908534#comment-16908534
 ] 

Nico Kruber edited comment on FLINK-13020 at 8/15/19 11:00 PM:
---

Actually, I just encountered this error in a branch of mine which is based on 
[latest 
master|https://github.com/apache/flink/commit/428ce1b938813fba287a51bf86e6c52ef54453cb].
 So either there has been a regression, or the fix does not work in all cases, 
or it is no duplicate afterall:
{code}
17:30:18.083 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 14.113 s <<< FAILURE! - in 
org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
17:30:18.083 [ERROR] testMigrationAndRestore[Migrate Savepoint: 
1.8](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
  Time elapsed: 0.268 s  <<< ERROR!
java.util.concurrent.ExecutionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
cancellation from one of its inputs
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
received cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
received cancellation from one of its inputs
{code}

https://api.travis-ci.com/v3/job/225588484/log.txt

{code}
17:30:17,408 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask  
 - Configuring application-defined state backend with job/cluster config
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - Source: Custom Source (2/4) (ffb5e756d6acddab9cab76e2a0a32904) switched from 
DEPLOYING to RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - Map (4/4) (79fcf333d4d11eae297b65e52e397658) switched from DEPLOYING to 
RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - Map (2/4) (aedaa4a61e74a3b766fafbef46e6aea6) switched from DEPLOYING to 
RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - Source: Custom Source (4/4) (a1f07e2714e73b2533291a322961ea67) switched from 
DEPLOYING to RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - Source: Custom Source (3/4) (6073be38d7be0ee571558f1dc865837a) switched from 
DEPLOYING to RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - Map (1/4) (e4bc84d8137769b513d1a5107027500d) switched from DEPLOYING to 
RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - Map (3/4) (6834950d9742da9c6a784ecc5ee892df) switched from DEPLOYING to 
RUNNING.
17:30:17,409 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator
 - Checkpoint triggering task Source: Custom Source (1/4) of job 
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. 
Aborting checkpoint.
17:30:17,413 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator
 - Checkpoint triggering task Source: Custom Source (1/4) of job 
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. 
Aborting checkpoint.
17:30:17,414 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator
 - Checkpoint triggering task Source: Custom Source (1/4) of job 
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. 
Aborting checkpoint.
17:30:17,416 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator
 - Checkpoint triggering task Source: Custom Source (1/4) of job 
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. 
Aborting checkpoint.
17:30:17,417 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator
 - Checkpoint triggering task Source: Custom Source (1/4) of job 
075cea7da1d0690f96c879ae07b058c0 is not in state RUNNING but DEPLOYING instead. 
Aborting checkpoint.
17:30:17,423 INFO  org.apache.flink.runtime.taskmanager.Task
 - Source: Custom Source (1/4) (8b302fefb0c10b7fd0b66f4fdb253632) switched from 
DEPLOYING to RUNNING.
17:30:17,423 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask  
 - Using application-defined state backend: MemoryStateBackend (data in heap 
memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', 
asynchronous: UNDEFINED, maxStateSize: 5242880)
17:30:17,423 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask  
 - Configuring application-defined state backend with job/cluster config
17:30:17,424 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - Source: Custom Source (1/4) 

[jira] [Updated] (FLINK-13020) UT Failure: ChainLengthDecreaseTest

2019-08-15 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13020:

Affects Version/s: 1.10.0

> UT Failure: ChainLengthDecreaseTest
> ---
>
> Key: FLINK-13020
> URL: https://issues.apache.org/jira/browse/FLINK-13020
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Bowen Li
>Priority: Major
>
> {code:java}
> 05:47:24.893 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 19.836 s <<< FAILURE! - in 
> org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
> 05:47:24.895 [ERROR] testMigrationAndRestore[Migrate Savepoint: 
> 1.3](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
>   Time elapsed: 1.501 s  <<< ERROR!
> java.util.concurrent.ExecutionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
> cancellation from one of its inputs
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
> cancellation from one of its inputs
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
> received cancellation from one of its inputs
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
> received cancellation from one of its inputs
> ...
> 05:48:27.736 [ERROR] Errors: 
> 05:48:27.736 [ERROR]   
> ChainLengthDecreaseTest>AbstractOperatorRestoreTestBase.testMigrationAndRestore:102->AbstractOperatorRestoreTestBase.migrateJob:138
>  » Execution
> 05:48:27.736 [INFO] 
> {code}
> https://travis-ci.org/apache/flink/jobs/551053821



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13020) UT Failure: ChainLengthDecreaseTest

2019-08-15 Thread Nico Kruber (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908534#comment-16908534
 ] 

Nico Kruber commented on FLINK-13020:
-

Actually, I just encountered this error in a branch of mine which is based on 
[latest 
master|https://github.com/apache/flink/commit/428ce1b938813fba287a51bf86e6c52ef54453cb].
 So either there has been a regression, or the fix does not work in all cases, 
or it is no duplicate afterall:
{code}
17:30:18.083 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 14.113 s <<< FAILURE! - in 
org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
17:30:18.083 [ERROR] testMigrationAndRestore[Migrate Savepoint: 
1.8](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
  Time elapsed: 0.268 s  <<< ERROR!
java.util.concurrent.ExecutionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
cancellation from one of its inputs
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
received cancellation from one of its inputs
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
received cancellation from one of its inputs
{code}

https://api.travis-ci.com/v3/job/225588484/log.txt

> UT Failure: ChainLengthDecreaseTest
> ---
>
> Key: FLINK-13020
> URL: https://issues.apache.org/jira/browse/FLINK-13020
> Project: Flink
>  Issue Type: Improvement
>Reporter: Bowen Li
>Priority: Major
>
> {code:java}
> 05:47:24.893 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 19.836 s <<< FAILURE! - in 
> org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
> 05:47:24.895 [ERROR] testMigrationAndRestore[Migrate Savepoint: 
> 1.3](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
>   Time elapsed: 1.501 s  <<< ERROR!
> java.util.concurrent.ExecutionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
> cancellation from one of its inputs
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
> cancellation from one of its inputs
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
> received cancellation from one of its inputs
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
> received cancellation from one of its inputs
> ...
> 05:48:27.736 [ERROR] Errors: 
> 05:48:27.736 [ERROR]   
> ChainLengthDecreaseTest>AbstractOperatorRestoreTestBase.testMigrationAndRestore:102->AbstractOperatorRestoreTestBase.migrateJob:138
>  » Execution
> 05:48:27.736 [INFO] 
> {code}
> https://travis-ci.org/apache/flink/jobs/551053821



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Reopened] (FLINK-13020) UT Failure: ChainLengthDecreaseTest

2019-08-15 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reopened FLINK-13020:
-

> UT Failure: ChainLengthDecreaseTest
> ---
>
> Key: FLINK-13020
> URL: https://issues.apache.org/jira/browse/FLINK-13020
> Project: Flink
>  Issue Type: Improvement
>Reporter: Bowen Li
>Priority: Major
>
> {code:java}
> 05:47:24.893 [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 19.836 s <<< FAILURE! - in 
> org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest
> 05:47:24.895 [ERROR] testMigrationAndRestore[Migrate Savepoint: 
> 1.3](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthDecreaseTest)
>   Time elapsed: 1.501 s  <<< ERROR!
> java.util.concurrent.ExecutionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
> cancellation from one of its inputs
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task received 
> cancellation from one of its inputs
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
> received cancellation from one of its inputs
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task 
> received cancellation from one of its inputs
> ...
> 05:48:27.736 [ERROR] Errors: 
> 05:48:27.736 [ERROR]   
> ChainLengthDecreaseTest>AbstractOperatorRestoreTestBase.testMigrationAndRestore:102->AbstractOperatorRestoreTestBase.migrateJob:138
>  » Execution
> 05:48:27.736 [INFO] 
> {code}
> https://travis-ci.org/apache/flink/jobs/551053821



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-13727) Build docs with jekyll 4.0.0 (final)

2019-08-14 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13727:

Description: 
When Jekyll 4.0.0 is out, we should upgrade to this final version and 
discontinue using the beta.

When we make this final, we could also follow these official recommendations:
{quote}
-
This version of Jekyll comes with some major changes.

Most notably:
  * Our `link` tag now comes with the `relative_url` filter incorporated into 
it.
You should no longer prepend `{{ site.baseurl }}` to `{% link foo.md %}`
For further details: https://github.com/jekyll/jekyll/pull/6727

  * Our `post_url` tag now comes with the `relative_url` filter incorporated 
into it.
You shouldn't prepend `{{ site.baseurl }}` to `{% post_url 2019-03-27-hello 
%}`
For further details: https://github.com/jekyll/jekyll/pull/7589

  * Support for deprecated configuration options has been removed. We will no 
longer
output a warning and gracefully assign their values to the newer 
counterparts
internally.
-
{quote}

  was:When Jekyll 4.0.0 is out, we should upgrade to this final version and 
discontinue using the beta.


> Build docs with jekyll 4.0.0 (final)
> 
>
> Key: FLINK-13727
> URL: https://issues.apache.org/jira/browse/FLINK-13727
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Priority: Major
>
> When Jekyll 4.0.0 is out, we should upgrade to this final version and 
> discontinue using the beta.
> When we make this final, we could also follow these official recommendations:
> {quote}
> -
> This version of Jekyll comes with some major changes.
> Most notably:
>   * Our `link` tag now comes with the `relative_url` filter incorporated into 
> it.
> You should no longer prepend `{{ site.baseurl }}` to `{% link foo.md %}`
> For further details: https://github.com/jekyll/jekyll/pull/6727
>   * Our `post_url` tag now comes with the `relative_url` filter incorporated 
> into it.
> You shouldn't prepend `{{ site.baseurl }}` to `{% post_url 
> 2019-03-27-hello %}`
> For further details: https://github.com/jekyll/jekyll/pull/7589
>   * Support for deprecated configuration options has been removed. We will no 
> longer
> output a warning and gracefully assign their values to the newer 
> counterparts
> internally.
> -
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13729) Update website generation dependencies

2019-08-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13729:
---

 Summary: Update website generation dependencies
 Key: FLINK-13729
 URL: https://issues.apache.org/jira/browse/FLINK-13729
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The website generation dependencies are quite old. By upgrading some of them we 
get improvements like a much nicer code highlighting and prepare for the jekyll 
update of FLINK-13726 and FLINK-13727.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13728) Fix wrong closing tag order in sidenav

2019-08-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13728:
---

 Summary: Fix wrong closing tag order in sidenav
 Key: FLINK-13728
 URL: https://issues.apache.org/jira/browse/FLINK-13728
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.8.1, 1.9.0, 1.10.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The order of closing HTML tags in the sidenav is wrong: instead of 
{{}} it should be {{}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-13724) Remove unnecessary whitespace from the docs' sidenav

2019-08-14 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13724:

Description: The side navigation generates quite some white space that will 
end up in every HTML page. Removing this reduces final page sizes and also 
improved site generation speed.  (was: The site navigation generates quite some 
white space that will end up in every HTML page. Removing this reduces final 
page sizes and also improved site generation speed.)

> Remove unnecessary whitespace from the docs' sidenav
> 
>
> Key: FLINK-13724
> URL: https://issues.apache.org/jira/browse/FLINK-13724
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>
> The side navigation generates quite some white space that will end up in 
> every HTML page. Removing this reduces final page sizes and also improved 
> site generation speed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-13724) Remove unnecessary whitespace from the docs' sidenav

2019-08-14 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13724:

Summary: Remove unnecessary whitespace from the docs' sidenav  (was: Remove 
unnecessary whitespace from the docs' sitenav)

> Remove unnecessary whitespace from the docs' sidenav
> 
>
> Key: FLINK-13724
> URL: https://issues.apache.org/jira/browse/FLINK-13724
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>
> The site navigation generates quite some white space that will end up in 
> every HTML page. Removing this reduces final page sizes and also improved 
> site generation speed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-13726) Build docs with jekyll 4.0.0.pre.beta1

2019-08-14 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13726:

Description: Jekyll 4 is way faster in generating the docs than jekyll 3 - 
probably due to the newly introduced cache. Site generation time goes down by 
roughly a factor of 2.5 even with the current beta version!  (was: Jekyll 4 is 
way faster in generating the docs than jekyll 3 - probably due to the newly 
introduced cache. Site generation time goes down by roughly a factor of 2.5!)

> Build docs with jekyll 4.0.0.pre.beta1
> --
>
> Key: FLINK-13726
> URL: https://issues.apache.org/jira/browse/FLINK-13726
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.10.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>
> Jekyll 4 is way faster in generating the docs than jekyll 3 - probably due to 
> the newly introduced cache. Site generation time goes down by roughly a 
> factor of 2.5 even with the current beta version!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13727) Build docs with jekyll 4.0.0 (final)

2019-08-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13727:
---

 Summary: Build docs with jekyll 4.0.0 (final)
 Key: FLINK-13727
 URL: https://issues.apache.org/jira/browse/FLINK-13727
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Nico Kruber


When Jekyll 4.0.0 is out, we should upgrade to this final version and 
discontinue using the beta.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13726) Build docs with jekyll 4.0.0.pre.beta1

2019-08-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13726:
---

 Summary: Build docs with jekyll 4.0.0.pre.beta1
 Key: FLINK-13726
 URL: https://issues.apache.org/jira/browse/FLINK-13726
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Jekyll 4 is way faster in generating the docs than jekyll 3 - probably due to 
the newly introduced cache. Site generation time goes down by roughly a factor 
of 2.5!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13725) Use sassc for faster doc generation

2019-08-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13725:
---

 Summary: Use sassc for faster doc generation
 Key: FLINK-13725
 URL: https://issues.apache.org/jira/browse/FLINK-13725
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Jekyll requires {{sass}} but can optionally also use a C-based implementation 
provided by {{sassc}}. Although we do not use sass directly, there may be some 
indirect use inside jekyll. It doesn't seem to hurt to upgrade here.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13724) Remove unnecessary whitespace from the docs' sitenav

2019-08-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13724:
---

 Summary: Remove unnecessary whitespace from the docs' sitenav
 Key: FLINK-13724
 URL: https://issues.apache.org/jira/browse/FLINK-13724
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The site navigation generates quite some white space that will end up in every 
HTML page. Removing this reduces final page sizes and also improved site 
generation speed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13723) Use liquid-c for faster doc generation

2019-08-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13723:
---

 Summary: Use liquid-c for faster doc generation
 Key: FLINK-13723
 URL: https://issues.apache.org/jira/browse/FLINK-13723
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Jekyll requires {{liquid}} and only optionally uses {{liquid-c}} if available. 
The latter uses natively-compiled code and reduces generation time by ~5% for 
me.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13722) Speed up documentation generation

2019-08-14 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13722:
---

 Summary: Speed up documentation generation
 Key: FLINK-13722
 URL: https://issues.apache.org/jira/browse/FLINK-13722
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Creating the documentation via {{build_docs.sh}} currently takes about 150s!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-13537) Changing Kafka producer pool size and scaling out may create overlapping transaction IDs

2019-08-07 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13537:

Description: 
The Kafka producer's transaction IDs are only generated once when there was no 
previous state for that operator. In the case where we restore and increase 
parallelism (scale-out), some operators may not have previous state and create 
new IDs. Now, if we also reduce the {{poolSize}}, these new IDs may overlap 
with the old ones which should never happen! Similarly, a scale-in + increasing 
{{poolSize}} could lead the the same thing.

An easy "fix" for this would be to forbid changing the {{poolSize}}. We could 
potentially be a bit better by only forbidding changes that can lead to 
transaction ID overlaps which we can identify from the formulae that 
{{TransactionalIdsGenerator}} uses. This should probably be the first step 
which can also be back-ported to older Flink versions just in case.


On a side note, the current scheme also relies on the fact, that the operator's 
list state distributes previous states during scale-out in a fashion that only 
the operators with the highest subtask indices do not get a previous state. 
This is somewhat "guaranteed" by {{OperatorStateStore#getListState()}} but I'm 
not sure whether we should actually rely on that there.

  was:
The Kafka producer's transaction IDs are only generated once when there was no 
previous state for that operator. In the case where we restore and increase 
parallelism (scale-out), some operators may not have previous state and create 
new IDs. Now, if we also reduce the poolSize, these new IDs may overlap with 
the old ones which should never happen!

On a side note, the current scheme also relies on the fact, that the operator's 
list state distributes previous states during scale-out in a fashion that only 
the operators with the highest subtask indices do not get a previous state. 
This is somewhat "guaranteed" by {{OperatorStateStore#getListState()}} but I'm 
not sure whether we should actually rely on that there.


> Changing Kafka producer pool size and scaling out may create overlapping 
> transaction IDs
> 
>
> Key: FLINK-13537
> URL: https://issues.apache.org/jira/browse/FLINK-13537
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Priority: Major
>
> The Kafka producer's transaction IDs are only generated once when there was 
> no previous state for that operator. In the case where we restore and 
> increase parallelism (scale-out), some operators may not have previous state 
> and create new IDs. Now, if we also reduce the {{poolSize}}, these new IDs 
> may overlap with the old ones which should never happen! Similarly, a 
> scale-in + increasing {{poolSize}} could lead the the same thing.
> An easy "fix" for this would be to forbid changing the {{poolSize}}. We could 
> potentially be a bit better by only forbidding changes that can lead to 
> transaction ID overlaps which we can identify from the formulae that 
> {{TransactionalIdsGenerator}} uses. This should probably be the first step 
> which can also be back-ported to older Flink versions just in case.
> 
> On a side note, the current scheme also relies on the fact, that the 
> operator's list state distributes previous states during scale-out in a 
> fashion that only the operators with the highest subtask indices do not get a 
> previous state. This is somewhat "guaranteed" by 
> {{OperatorStateStore#getListState()}} but I'm not sure whether we should 
> actually rely on that there.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (FLINK-13498) Reduce Kafka producer startup time by aborting transactions in parallel

2019-08-06 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-13498.
---
   Resolution: Fixed
Fix Version/s: 1.10.0

fixed on master via d774fea

> Reduce Kafka producer startup time by aborting transactions in parallel
> ---
>
> Key: FLINK-13498
> URL: https://issues.apache.org/jira/browse/FLINK-13498
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a Flink job with a Kafka producer starts up without previous state, it 
> currently starts 5 * kafkaPoolSize number of Kafka producers (per sink 
> instance) to abort potentially existing transactions from a first run without 
> a completed snapshot.
> Apparently, this is quite slow and it is also done sequentially. Until there 
> is a better way of aborting these transactions with Kafka, we could do this 
> in parallel quite easily and at least make use of lingering CPU resources.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-13535) Do not abort transactions twice during KafkaProducer startup

2019-08-01 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13535:

Description: 
During startup of a transactional Kafka producer from previous state, we 
recover in two steps:
# in {{TwoPhaseCommitSinkFunction}}, we commit pending commit-transactions and 
abort pending transactions and then call into {{finishRecoveringContext()}}
# in {{FlinkKafkaProducer#finishRecoveringContext()}} we iterate over all 
recovered transaction IDs and abort them.

This may lead to some transactions being worked on twice. Since this is quite 
some expensive operation, we unnecessarily slow down the job startup but could 
easily give {{finishRecoveringContext()}} a set of transactions that 
{{TwoPhaseCommitSinkFunction}} already covered instead.

  was:
During startup of a transactional Kafka producer from previous state, we 
recover in two steps:
# in {{TwoPhaseCommitSinkFunction}}, we commit pending commit-transactions and 
abort pending transactions and then call into {{finishRecoveringContext()}}
# in {{FlinkKafkaProducer#finishRecoveringContext()}} we iterate over all 
recovered transaction IDs and abort them
This may lead to some transactions being worked on twice. Since this is quite 
some expensive operation, we unnecessarily slow down the job startup but could 
easily give {{finishRecoveringContext()}} a set of transactions that 
{{TwoPhaseCommitSinkFunction}} already covered instead.


> Do not abort transactions twice during KafkaProducer startup
> 
>
> Key: FLINK-13535
> URL: https://issues.apache.org/jira/browse/FLINK-13535
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kafka
>Affects Versions: 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>
> During startup of a transactional Kafka producer from previous state, we 
> recover in two steps:
> # in {{TwoPhaseCommitSinkFunction}}, we commit pending commit-transactions 
> and abort pending transactions and then call into 
> {{finishRecoveringContext()}}
> # in {{FlinkKafkaProducer#finishRecoveringContext()}} we iterate over all 
> recovered transaction IDs and abort them.
> This may lead to some transactions being worked on twice. Since this is quite 
> some expensive operation, we unnecessarily slow down the job startup but 
> could easily give {{finishRecoveringContext()}} a set of transactions that 
> {{TwoPhaseCommitSinkFunction}} already covered instead.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-13537) Changing Kafka producer pool size and scaling out may create overlapping transaction IDs

2019-08-01 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13537:

Description: 
The Kafka producer's transaction IDs are only generated once when there was no 
previous state for that operator. In the case where we restore and increase 
parallelism (scale-out), some operators may not have previous state and create 
new IDs. Now, if we also reduce the poolSize, these new IDs may overlap with 
the old ones which should never happen!

On a side note, the current scheme also relies on the fact, that the operator's 
list state distributes previous states during scale-out in a fashion that only 
the operators with the highest subtask indices do not get a previous state. 
This is somewhat "guaranteed" by {{OperatorStateStore#getListState()}} but I'm 
not sure whether we should actually rely on that there.

  was:The Kafka producer's transaction IDs are only generated once when there 
was no previous state for that operator. In the case where we restore and 
increase parallelism (scale-out), some operators may not have previous state 
and create new IDs. Now, if we also reduce the poolSize, these new IDs may 
overlap with the old ones which should never happen!


> Changing Kafka producer pool size and scaling out may create overlapping 
> transaction IDs
> 
>
> Key: FLINK-13537
> URL: https://issues.apache.org/jira/browse/FLINK-13537
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Priority: Major
>
> The Kafka producer's transaction IDs are only generated once when there was 
> no previous state for that operator. In the case where we restore and 
> increase parallelism (scale-out), some operators may not have previous state 
> and create new IDs. Now, if we also reduce the poolSize, these new IDs may 
> overlap with the old ones which should never happen!
> On a side note, the current scheme also relies on the fact, that the 
> operator's list state distributes previous states during scale-out in a 
> fashion that only the operators with the highest subtask indices do not get a 
> previous state. This is somewhat "guaranteed" by 
> {{OperatorStateStore#getListState()}} but I'm not sure whether we should 
> actually rely on that there.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13537) Changing Kafka producer pool size and scaling out may create overlapping transaction IDs

2019-08-01 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13537:
---

 Summary: Changing Kafka producer pool size and scaling out may 
create overlapping transaction IDs
 Key: FLINK-13537
 URL: https://issues.apache.org/jira/browse/FLINK-13537
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.8.1, 1.9.0
Reporter: Nico Kruber


The Kafka producer's transaction IDs are only generated once when there was no 
previous state for that operator. In the case where we restore and increase 
parallelism (scale-out), some operators may not have previous state and create 
new IDs. Now, if we also reduce the poolSize, these new IDs may overlap with 
the old ones which should never happen!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13535) Do not abort transactions twice during KafkaProducer startup

2019-08-01 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13535:
---

 Summary: Do not abort transactions twice during KafkaProducer 
startup
 Key: FLINK-13535
 URL: https://issues.apache.org/jira/browse/FLINK-13535
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Affects Versions: 1.8.1, 1.9.0
Reporter: Nico Kruber
Assignee: Nico Kruber


During startup of a transactional Kafka producer from previous state, we 
recover in two steps:
# in {{TwoPhaseCommitSinkFunction}}, we commit pending commit-transactions and 
abort pending transactions and then call into {{finishRecoveringContext()}}
# in {{FlinkKafkaProducer#finishRecoveringContext()}} we iterate over all 
recovered transaction IDs and abort them
This may lead to some transactions being worked on twice. Since this is quite 
some expensive operation, we unnecessarily slow down the job startup but could 
easily give {{finishRecoveringContext()}} a set of transactions that 
{{TwoPhaseCommitSinkFunction}} already covered instead.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (FLINK-13517) Restructure Hive Catalog documentation

2019-07-31 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reassigned FLINK-13517:
---

Assignee: Seth Wiesman

> Restructure Hive Catalog documentation
> --
>
> Key: FLINK-13517
> URL: https://issues.apache.org/jira/browse/FLINK-13517
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive, Documentation
>Reporter: Seth Wiesman
>Assignee: Seth Wiesman
>Priority: Major
>
> Hive documentation is currently spread across a number of pages and 
> fragmented. In particular: 
> 1) An example was added to getting-started/examples, however, this section is 
> being removed
> 2) There is a dedicated page on hive integration but also a lot of hive 
> specific information is on the catalog page
> We should
> 1) Inline the example into the hive integration page
> 2) Move the hive specific information on catalogs.md to hive_integration.md
> 3) Make catalogs.md be just about catalogs in general and link to the hive 
> integration. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13498) Reduce Kafka producer startup time by aborting transactions in parallel

2019-07-30 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13498:
---

 Summary: Reduce Kafka producer startup time by aborting 
transactions in parallel
 Key: FLINK-13498
 URL: https://issues.apache.org/jira/browse/FLINK-13498
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.8.1, 1.9.0
Reporter: Nico Kruber
Assignee: Nico Kruber


When a Flink job with a Kafka producer starts up without previous state, it 
currently starts 5 * kafkaPoolSize number of Kafka producers (per sink 
instance) to abort potentially existing transactions from a first run without a 
completed snapshot.
Apparently, this is quite slow and it is also done sequentially. Until there is 
a better way of aborting these transactions with Kafka, we could do this in 
parallel quite easily and at least make use of lingering CPU resources.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (FLINK-12747) Getting Started - Table API Example Walkthrough

2019-07-30 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-12747.
---

> Getting Started - Table API Example Walkthrough
> ---
>
> Key: FLINK-12747
> URL: https://issues.apache.org/jira/browse/FLINK-12747
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Konstantin Knauf
>Assignee: Seth Wiesman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The planned structure for the new Getting Started Guide is 
> *  Flink Overview (~ two pages)
> * Project Setup
> ** Java
> ** Scala
> ** Python
> * Quickstarts
> ** Example Walkthrough - Table API / SQL
> ** Example Walkthrough - DataStream API
> * Docker Playgrounds
> ** Flink Cluster Playground
> ** Flink Interactive SQL Playground
> This tickets adds the Example Walkthrough for the Table API, which should 
> follow the same structure as the DataStream Example (FLINK-12746), which 
> needs to be completed first.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-12747) Getting Started - Table API Example Walkthrough

2019-07-30 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-12747:

Fix Version/s: (was: 1.10)
   1.10.0

> Getting Started - Table API Example Walkthrough
> ---
>
> Key: FLINK-12747
> URL: https://issues.apache.org/jira/browse/FLINK-12747
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Konstantin Knauf
>Assignee: Seth Wiesman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The planned structure for the new Getting Started Guide is 
> *  Flink Overview (~ two pages)
> * Project Setup
> ** Java
> ** Scala
> ** Python
> * Quickstarts
> ** Example Walkthrough - Table API / SQL
> ** Example Walkthrough - DataStream API
> * Docker Playgrounds
> ** Flink Cluster Playground
> ** Flink Interactive SQL Playground
> This tickets adds the Example Walkthrough for the Table API, which should 
> follow the same structure as the DataStream Example (FLINK-12746), which 
> needs to be completed first.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-12171) The network buffer memory size should not be checked against the heap size on the TM side

2019-07-30 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-12171:

Fix Version/s: (was: 1.10)
   1.10.0

> The network buffer memory size should not be checked against the heap size on 
> the TM side
> -
>
> Key: FLINK-12171
> URL: https://issues.apache.org/jira/browse/FLINK-12171
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.7.2, 1.8.0, 1.9.0
> Environment: Flink-1.7.2, and Flink-1.8 seems have not modified the 
> logic here.
>  
>Reporter: Yun Gao
>Assignee: Yun Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently when computing the network buffer memory size on the TM side in 
> _TaskManagerService#calculateNetworkBufferMemory_`(version 1.8 or 1.7) or 
> _NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master), 
> the computed network buffer memory size is checked to be less than 
> `maxJvmHeapMemory`. However, in TM side, _maxJvmHeapMemory_ stores the 
> maximum heap memory (namely -Xmx) .
>  
> With the above process, when TM starts, -Xmx is computed in RM or in 
> _taskmanager.sh_ with (container memory - network buffer memory - managed 
> memory),  thus the above checking implies that the heap memory of the TM must 
> be larger than the network memory, which seems to be not necessary.
>  
> This may cause TM to use more memory than expected. For example, for a job 
> who has a large network throughput, uses may configure network memory to 2G. 
> However, if users want to assign 1G to heap memory, the TM will fail to 
> start, and user has to allocate at least 2G heap memory (in other words, 4G 
> in total for the TM instead of 3G) to make the TM runnable. This may cause 
> resource inefficiency.
>  
> Therefore, I think the network buffer memory size also need to be checked 
> against the total memory instead of the heap memory on the TM  side:
>  # Checks that networkBufFraction < 1.0.
>  # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)).
>  # Compare the network buffer memory with the total memory.
> This checking is also consistent with the similar one done on the RM side.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (FLINK-12171) The network buffer memory size should not be checked against the heap size on the TM side

2019-07-30 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-12171.
---
   Resolution: Fixed
Fix Version/s: 1.10

fixed via 8dec21f

> The network buffer memory size should not be checked against the heap size on 
> the TM side
> -
>
> Key: FLINK-12171
> URL: https://issues.apache.org/jira/browse/FLINK-12171
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.7.2, 1.8.0, 1.9.0
> Environment: Flink-1.7.2, and Flink-1.8 seems have not modified the 
> logic here.
>  
>Reporter: Yun Gao
>Assignee: Yun Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently when computing the network buffer memory size on the TM side in 
> _TaskManagerService#calculateNetworkBufferMemory_`(version 1.8 or 1.7) or 
> _NetworkEnvironmentConfiguration#calculateNewNetworkBufferMemory_(master), 
> the computed network buffer memory size is checked to be less than 
> `maxJvmHeapMemory`. However, in TM side, _maxJvmHeapMemory_ stores the 
> maximum heap memory (namely -Xmx) .
>  
> With the above process, when TM starts, -Xmx is computed in RM or in 
> _taskmanager.sh_ with (container memory - network buffer memory - managed 
> memory),  thus the above checking implies that the heap memory of the TM must 
> be larger than the network memory, which seems to be not necessary.
>  
> This may cause TM to use more memory than expected. For example, for a job 
> who has a large network throughput, uses may configure network memory to 2G. 
> However, if users want to assign 1G to heap memory, the TM will fail to 
> start, and user has to allocate at least 2G heap memory (in other words, 4G 
> in total for the TM instead of 3G) to make the TM runnable. This may cause 
> resource inefficiency.
>  
> Therefore, I think the network buffer memory size also need to be checked 
> against the total memory instead of the heap memory on the TM  side:
>  # Checks that networkBufFraction < 1.0.
>  # Compute the total memory by ( jvmHeapNoNet / (1 - networkBufFraction)).
>  # Compare the network buffer memory with the total memory.
> This checking is also consistent with the similar one done on the RM side.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (FLINK-12747) Getting Started - Table API Example Walkthrough

2019-07-29 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber resolved FLINK-12747.
-
   Resolution: Fixed
Fix Version/s: 1.10

fixed via f4943dd

> Getting Started - Table API Example Walkthrough
> ---
>
> Key: FLINK-12747
> URL: https://issues.apache.org/jira/browse/FLINK-12747
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Konstantin Knauf
>Assignee: Seth Wiesman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The planned structure for the new Getting Started Guide is 
> *  Flink Overview (~ two pages)
> * Project Setup
> ** Java
> ** Scala
> ** Python
> * Quickstarts
> ** Example Walkthrough - Table API / SQL
> ** Example Walkthrough - DataStream API
> * Docker Playgrounds
> ** Flink Cluster Playground
> ** Flink Interactive SQL Playground
> This tickets adds the Example Walkthrough for the Table API, which should 
> follow the same structure as the DataStream Example (FLINK-12746), which 
> needs to be completed first.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13417) Bump Zookeeper to 3.5.5

2019-07-29 Thread Nico Kruber (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895170#comment-16895170
 ] 

Nico Kruber commented on FLINK-13417:
-

FYI: since 3.5.5 is the first stable version in the 3.5.x series[1], we should 
actually take this, not any older 3.5.x

[1] https://zookeeper.apache.org/releases.html

> Bump Zookeeper to 3.5.5
> ---
>
> Key: FLINK-13417
> URL: https://issues.apache.org/jira/browse/FLINK-13417
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Konstantin Knauf
>Priority: Major
>
> User might want to secure their Zookeeper connection via SSL.
> This requires a Zookeeper version >= 3.5.1. We might as well try to bump it 
> to 3.5.5, which is the latest version. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (FLINK-12741) Update docs about Kafka producer fault tolerance guarantees

2019-07-23 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-12741.
---
   Resolution: Fixed
Fix Version/s: 1.8.2
   1.7.3

merged for
- 1.7: 56c3e7cd653e4cb2ad0a76ca317aa9fa1d564dc2
- 1.8: 91d036f794cfd96a3c1da445d5172690054aee2f

> Update docs about Kafka producer fault tolerance guarantees
> ---
>
> Key: FLINK-12741
> URL: https://issues.apache.org/jira/browse/FLINK-12741
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.9.0
>Reporter: Paul Lin
>Assignee: Paul Lin
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.7.3, 1.8.2, 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since Flink 1.4.0, we provide exactly-once semantic on Kafka 0.11+, but the 
> document is still not updated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Reopened] (FLINK-12741) Update docs about Kafka producer fault tolerance guarantees

2019-07-23 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reopened FLINK-12741:
-

> Update docs about Kafka producer fault tolerance guarantees
> ---
>
> Key: FLINK-12741
> URL: https://issues.apache.org/jira/browse/FLINK-12741
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.9.0
>Reporter: Paul Lin
>Assignee: Paul Lin
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since Flink 1.4.0, we provide exactly-once semantic on Kafka 0.11+, but the 
> document is still not updated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13245) Network stack is leaking files

2019-07-19 Thread Nico Kruber (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888982#comment-16888982
 ] 

Nico Kruber commented on FLINK-13245:
-

I agree with [~zjwang] - changing the semantics should be tackled separately, 
not necessarily as part of this bug fix. I'll see when I have time to look at 
the PR so we can get this merged

> Network stack is leaking files
> --
>
> Key: FLINK-13245
> URL: https://issues.apache.org/jira/browse/FLINK-13245
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.9.0
>Reporter: Chesnay Schepler
>Assignee: zhijiang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There's file leak in the network stack / shuffle service.
> When running the {{SlotCountExceedingParallelismTest}} on Windows a large 
> number of {{.channel}} files continue to reside in a 
> {{flink-netty-shuffle-XXX}} directory.
> From what I've gathered so far these files are still being used by a 
> {{BoundedBlockingSubpartition}}. The cleanup logic in this class uses 
> ref-counting to ensure we don't release data while a reader is still present. 
> However, at the end of the job this count has not reached 0, and thus nothing 
> is being released.
> The same issue is also present on the {{ResultPartition}} level; the 
> {{ReleaseOnConsumptionResultPartition}} also are being released while the 
> ref-count is greater than 0.
> Overall it appears like there's some issue with the notifications for 
> partitions being consumed.
> It is feasible that this issue has recently caused issues on Travis where the 
> build were failing due to a lack of disk space.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3

2019-07-17 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-8801.
--
Resolution: Fixed

> S3's eventual consistent read-after-write may fail yarn deployment of 
> resources to S3
> -
>
> Key: FLINK-8801
> URL: https://issues.apache.org/jira/browse/FLINK-8801
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / Coordination
>Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.9.0, 1.4.3, 1.5.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> According to 
> https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel:
> {quote}
> Amazon S3 provides read-after-write consistency for PUTS of new objects in 
> your S3 bucket in all regions with one caveat. The caveat is that if you make 
> a HEAD or GET request to the key name (to find if the object exists) before 
> creating the object, Amazon S3 provides eventual consistency for 
> read-after-write.
> {quote}
> Some S3 file system implementations may actually execute such a request for 
> the about-to-write object and thus the read-after-write is only eventually 
> consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently 
> relies on a consistent read-after-write since it accesses the remote resource 
> to get file size and modification timestamp. Since there we have access to 
> the local resource, we can use the data from there instead and circumvent the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3

2019-07-17 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-8801:
---
Fix Version/s: (was: 1.10.0)

> S3's eventual consistent read-after-write may fail yarn deployment of 
> resources to S3
> -
>
> Key: FLINK-8801
> URL: https://issues.apache.org/jira/browse/FLINK-8801
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / Coordination
>Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.0, 1.9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> According to 
> https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel:
> {quote}
> Amazon S3 provides read-after-write consistency for PUTS of new objects in 
> your S3 bucket in all regions with one caveat. The caveat is that if you make 
> a HEAD or GET request to the key name (to find if the object exists) before 
> creating the object, Amazon S3 provides eventual consistency for 
> read-after-write.
> {quote}
> Some S3 file system implementations may actually execute such a request for 
> the about-to-write object and thus the read-after-write is only eventually 
> consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently 
> relies on a consistent read-after-write since it accesses the remote resource 
> to get file size and modification timestamp. Since there we have access to 
> the local resource, we can use the data from there instead and circumvent the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Reopened] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3

2019-07-17 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reopened FLINK-8801:


> S3's eventual consistent read-after-write may fail yarn deployment of 
> resources to S3
> -
>
> Key: FLINK-8801
> URL: https://issues.apache.org/jira/browse/FLINK-8801
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / Coordination
>Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.0, 1.9.0, 1.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> According to 
> https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel:
> {quote}
> Amazon S3 provides read-after-write consistency for PUTS of new objects in 
> your S3 bucket in all regions with one caveat. The caveat is that if you make 
> a HEAD or GET request to the key name (to find if the object exists) before 
> creating the object, Amazon S3 provides eventual consistency for 
> read-after-write.
> {quote}
> Some S3 file system implementations may actually execute such a request for 
> the about-to-write object and thus the read-after-write is only eventually 
> consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently 
> relies on a consistent read-after-write since it accesses the remote resource 
> to get file size and modification timestamp. Since there we have access to 
> the local resource, we can use the data from there instead and circumvent the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3

2019-07-16 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-8801.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

also merged to release-1.9 via b56234ce4e

> S3's eventual consistent read-after-write may fail yarn deployment of 
> resources to S3
> -
>
> Key: FLINK-8801
> URL: https://issues.apache.org/jira/browse/FLINK-8801
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / Coordination
>Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.9.0, 1.10.0, 1.4.3, 1.5.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> According to 
> https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel:
> {quote}
> Amazon S3 provides read-after-write consistency for PUTS of new objects in 
> your S3 bucket in all regions with one caveat. The caveat is that if you make 
> a HEAD or GET request to the key name (to find if the object exists) before 
> creating the object, Amazon S3 provides eventual consistency for 
> read-after-write.
> {quote}
> Some S3 file system implementations may actually execute such a request for 
> the about-to-write object and thus the read-after-write is only eventually 
> consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently 
> relies on a consistent read-after-write since it accesses the remote resource 
> to get file size and modification timestamp. Since there we have access to 
> the local resource, we can use the data from there instead and circumvent the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Reopened] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3

2019-07-16 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reopened FLINK-8801:


> S3's eventual consistent read-after-write may fail yarn deployment of 
> resources to S3
> -
>
> Key: FLINK-8801
> URL: https://issues.apache.org/jira/browse/FLINK-8801
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / Coordination
>Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.0, 1.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> According to 
> https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel:
> {quote}
> Amazon S3 provides read-after-write consistency for PUTS of new objects in 
> your S3 bucket in all regions with one caveat. The caveat is that if you make 
> a HEAD or GET request to the key name (to find if the object exists) before 
> creating the object, Amazon S3 provides eventual consistency for 
> read-after-write.
> {quote}
> Some S3 file system implementations may actually execute such a request for 
> the about-to-write object and thus the read-after-write is only eventually 
> consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently 
> relies on a consistent read-after-write since it accesses the remote resource 
> to get file size and modification timestamp. Since there we have access to 
> the local resource, we can use the data from there instead and circumvent the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3

2019-07-16 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber resolved FLINK-8801.

   Resolution: Fixed
Fix Version/s: 1.10.0

merged to master via 770a404

> S3's eventual consistent read-after-write may fail yarn deployment of 
> resources to S3
> -
>
> Key: FLINK-8801
> URL: https://issues.apache.org/jira/browse/FLINK-8801
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / Coordination
>Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.4.3, 1.5.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> According to 
> https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel:
> {quote}
> Amazon S3 provides read-after-write consistency for PUTS of new objects in 
> your S3 bucket in all regions with one caveat. The caveat is that if you make 
> a HEAD or GET request to the key name (to find if the object exists) before 
> creating the object, Amazon S3 provides eventual consistency for 
> read-after-write.
> {quote}
> Some S3 file system implementations may actually execute such a request for 
> the about-to-write object and thus the read-after-write is only eventually 
> consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently 
> relies on a consistent read-after-write since it accesses the remote resource 
> to get file size and modification timestamp. Since there we have access to 
> the local resource, we can use the data from there instead and circumvent the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Reopened] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3

2019-07-15 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reopened FLINK-8801:


> S3's eventual consistent read-after-write may fail yarn deployment of 
> resources to S3
> -
>
> Key: FLINK-8801
> URL: https://issues.apache.org/jira/browse/FLINK-8801
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / Coordination
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> According to 
> https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel:
> {quote}
> Amazon S3 provides read-after-write consistency for PUTS of new objects in 
> your S3 bucket in all regions with one caveat. The caveat is that if you make 
> a HEAD or GET request to the key name (to find if the object exists) before 
> creating the object, Amazon S3 provides eventual consistency for 
> read-after-write.
> {quote}
> Some S3 file system implementations may actually execute such a request for 
> the about-to-write object and thus the read-after-write is only eventually 
> consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently 
> relies on a consistent read-after-write since it accesses the remote resource 
> to get file size and modification timestamp. Since there we have access to 
> the local resource, we can use the data from there instead and circumvent the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-8801) S3's eventual consistent read-after-write may fail yarn deployment of resources to S3

2019-07-15 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-8801:
---
Affects Version/s: 1.9.0
   1.6.4
   1.7.2
   1.8.1

> S3's eventual consistent read-after-write may fail yarn deployment of 
> resources to S3
> -
>
> Key: FLINK-8801
> URL: https://issues.apache.org/jira/browse/FLINK-8801
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, FileSystems, Runtime / Coordination
>Affects Versions: 1.4.0, 1.5.0, 1.6.4, 1.7.2, 1.8.1, 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.4.3, 1.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> According to 
> https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel:
> {quote}
> Amazon S3 provides read-after-write consistency for PUTS of new objects in 
> your S3 bucket in all regions with one caveat. The caveat is that if you make 
> a HEAD or GET request to the key name (to find if the object exists) before 
> creating the object, Amazon S3 provides eventual consistency for 
> read-after-write.
> {quote}
> Some S3 file system implementations may actually execute such a request for 
> the about-to-write object and thus the read-after-write is only eventually 
> consistent. {{org.apache.flink.yarn.Utils#setupLocalResource()}} currently 
> relies on a consistent read-after-write since it accesses the remote resource 
> to get file size and modification timestamp. Since there we have access to 
> the local resource, we can use the data from there instead and circumvent the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (FLINK-13173) Only run openSSL tests if desired

2019-07-10 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber closed FLINK-13173.
---
Resolution: Fixed

Fixed in master via 6d79968f04d549d37b3bcda086a1484e78f61ac3

> Only run openSSL tests if desired
> -
>
> Key: FLINK-13173
> URL: https://issues.apache.org/jira/browse/FLINK-13173
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network, Tests, Travis
>Affects Versions: 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Rename {{flink.tests.force-openssl}} to {{flink.tests.with-openssl}} and only 
> run openSSL-based unit tests if this is set. This way, we avoid systems where 
> the bundled dynamic libraries do not work. Travis seems to run fine and will 
> have this property set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-13172) JVM crash with dynamic netty-tcnative wrapper to openSSL on some OS

2019-07-09 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-13172:

Description: 
The dynamically-linked wrapper library in 
{{flink-shaded-netty-tcnative-dynamic}} may not work on all systems, depending 
on how the system-provided openSSL library is built.
As a result, when trying to run Flink with {{security.ssl.provider: OPENSSL}} 
or just running a test based on {{SSLUtilsTest}} (which checks for openSSL 
availability which is enough to trigger the error below), the JVM will crash, 
e.g. with
- on SUSE-based systems:
{code}
/usr/lib64/jvm/java-openjdk/bin/java: relocation error: 
/tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_644115489043239307863.so:
 symbol TLSv1_2_server_method version OPENSSL_1.0.1 not defined in file 
libssl.so.1.0.0 with link time reference
{code}
- on Arch Linux:
{code}
/usr/lib/jvm/default/bin/java: relocation error: 
/tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_648476498532937980008.so:
 symbol SSLv3_method version OPENSSL_1.0.0 not defined in file libssl.so.1.0.0 
with link time reference
{code}

Possible solutions:
# build your own OS-dependent dynamically-linked {{netty-tcnative}} library and 
shade it in your own build of {{flink-shaded-netty-tcnative-dynamic}}, or
# use {{flink-shaded-netty-tcnative-static}}:
{code}
git clone https://github.com/apache/flink-shaded.git
cd flink-shaded
mvn clean package -Pinclude-netty-tcnative-static -pl 
flink-shaded-netty-tcnative-static
{code}
# get your OS-dependent build into netty-tcnative as a special branch similar 
to what they currently do with Fedora-based systems

  was:
The dynamically-linked wrapper library in 
{{flink-shaded-netty-tcnative-dynamic}} may not work on all systems, depending 
on how the system-provided openSSL library is built.
As a result, when trying to run Flink with {{security.ssl.provider: OPENSSL}} 
or running a test based on {{SSLUtilsTest}} (which checks for openSSL 
availability), the JVM will crash, e.g. with
- on SUSE-based systems:
{code}
/usr/lib64/jvm/java-openjdk/bin/java: relocation error: 
/tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_644115489043239307863.so:
 symbol TLSv1_2_server_method version OPENSSL_1.0.1 not defined in file 
libssl.so.1.0.0 with link time reference
{code}
- on Arch Linux:
{code}
/usr/lib/jvm/default/bin/java: relocation error: 
/tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_648476498532937980008.so:
 symbol SSLv3_method version OPENSSL_1.0.0 not defined in file libssl.so.1.0.0 
with link time reference
{code}

Possible solutions:
# build your own OS-dependent dynamically-linked {{netty-tcnative}} library and 
shade it in your own build of {{flink-shaded-netty-tcnative-dynamic}}, or
# use {{flink-shaded-netty-tcnative-static}}:
{code}
git clone https://github.com/apache/flink-shaded.git
cd flink-shaded
mvn clean package -Pinclude-netty-tcnative-static -pl 
flink-shaded-netty-tcnative-static
{code}
# get your OS-dependent build into netty-tcnative as a special branch similar 
to what they currently do with Fedora-based systems


> JVM crash with dynamic netty-tcnative wrapper to openSSL on some OS
> ---
>
> Key: FLINK-13172
> URL: https://issues.apache.org/jira/browse/FLINK-13172
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network, Tests
>Affects Versions: 1.9.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>
> The dynamically-linked wrapper library in 
> {{flink-shaded-netty-tcnative-dynamic}} may not work on all systems, 
> depending on how the system-provided openSSL library is built.
> As a result, when trying to run Flink with {{security.ssl.provider: OPENSSL}} 
> or just running a test based on {{SSLUtilsTest}} (which checks for openSSL 
> availability which is enough to trigger the error below), the JVM will crash, 
> e.g. with
> - on SUSE-based systems:
> {code}
> /usr/lib64/jvm/java-openjdk/bin/java: relocation error: 
> /tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_644115489043239307863.so:
>  symbol TLSv1_2_server_method version OPENSSL_1.0.1 not defined in file 
> libssl.so.1.0.0 with link time reference
> {code}
> - on Arch Linux:
> {code}
> /usr/lib/jvm/default/bin/java: relocation error: 
> /tmp/liborg_apache_flink_shaded_netty4_netty_tcnative_linux_x86_648476498532937980008.so:
>  symbol SSLv3_method version OPENSSL_1.0.0 not defined in file 
> libssl.so.1.0.0 with link time reference
> {code}
> Possible solutions:
> # build your own OS-dependent dynamically-linked {{netty-tcnative}} library 
> and shade it in your own build of {{flink-shaded-netty-tcnative-dynamic}}, or
> # use {{flink-shaded-netty-tcnative-static}}:
> {code}
> git clone 

<    1   2   3   4   5   6   7   8   9   10   >