[jira] [Updated] (FLINK-3814) Update code style guide regarding Preconditions
[ https://issues.apache.org/jira/browse/FLINK-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-3814: -- Labels: auto-deprioritized-major (was: stale-major) > Update code style guide regarding Preconditions > --- > > Key: FLINK-3814 > URL: https://issues.apache.org/jira/browse/FLINK-3814 > Project: Flink > Issue Type: Bug > Components: Documentation >Reporter: Chesnay Schepler >Priority: Major > Labels: auto-deprioritized-major > > We recently added a Preconditions class to replace Guava when possible. > The Code Style Guide however still suggests to use Guava when possible > (explicitly naming the ported checkNotNull), and never mentions our own > Preconditions class. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5676) Grouping on nested fields does not work
[ https://issues.apache.org/jira/browse/FLINK-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-5676: -- Labels: auto-deprioritized-major (was: stale-major) > Grouping on nested fields does not work > --- > > Key: FLINK-5676 > URL: https://issues.apache.org/jira/browse/FLINK-5676 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Reporter: Timo Walther >Priority: Major > Labels: auto-deprioritized-major > > {code} > tEnv > .fromDataSet(pojoWithinnerPojo) > .groupBy("innerPojo.get('line')") > .select("innerPojo.get('line')") > {code} > fails with > {code} > ValidationException: Cannot resolve [innerPojo] given input > ['innerPojo.get(line)]. > {code} > I don't know if we want to support that but the exception should be more > helpful anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-4620) Automatically creating savepoints
[ https://issues.apache.org/jira/browse/FLINK-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-4620: -- Labels: auto-deprioritized-major (was: stale-major) > Automatically creating savepoints > - > > Key: FLINK-4620 > URL: https://issues.apache.org/jira/browse/FLINK-4620 > Project: Flink > Issue Type: New Feature > Components: Runtime / State Backends >Affects Versions: 1.1.2 >Reporter: Niels Basjes >Priority: Major > Labels: auto-deprioritized-major > > In the current versions of Flink you can run an external command and then a > savepoint is persisted in a durable location. > Feature request: Make this a lot more automatic and easy to use. > _Proposed workflow_ > # In my application I do something like this: > {code} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.setStateBackend(new FsStateBackend("hdfs:///tmp/applicationState")); > env.enableCheckpointing(5000, CheckpointingMode.EXACTLY_ONCE); > env.enableAutomaticSavePoints(30); > env.enableAutomaticSavePointCleaner(10); > {code} > # When I start the application for the first time the state backend is > 'empty'. > I expect the system to start in a clean state. > After 10 minutes (30ms) a savepoint is created and stored. > # When I stop and start the topology again it will automatically restore the > last available savepoint. > Things to think about: > * Note that this feature still means the manual version is useful!! > * What to do on startup if the state is incompatible with the topology? Fail > the startup? > * How many automatic savepoints to we keep? Only the last one? > * Perhaps the API should allow multiple automatic savepoints at different > intervals in different locations. > {code} > // Make every 10 minutes and keep the last 10 > env.enableAutomaticSavePoints(30, new > FsStateBackend("hdfs:///tmp/applicationState"), 10); > // Make every 24 hours and keep the last 30 > // Useful for being able to reproduce a problem a few days later > env.enableAutomaticSavePoints(8640, new > FsStateBackend("hdfs:///tmp/applicationDailyStateSnapshot"), 30); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-3273) Remove Scala dependency from flink-streaming-java
[ https://issues.apache.org/jira/browse/FLINK-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-3273: -- Priority: Minor (was: Major) > Remove Scala dependency from flink-streaming-java > - > > Key: FLINK-3273 > URL: https://issues.apache.org/jira/browse/FLINK-3273 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Maximilian Michels >Priority: Minor > Labels: auto-deprioritized-major > > {{flink-streaming-java}} depends on Scala through {{flink-clients}}, > {{flink-runtime}}, and {{flink-testing-utils}}. We should get rid of the > Scala dependency just like we did for {{flink-java}}. Integration tests and > utilities which depend on Scala should be moved to {{flink-tests}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-13847) Update release scripts to also update docs/_config.yml
[ https://issues.apache.org/jira/browse/FLINK-13847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336436#comment-17336436 ] Flink Jira Bot commented on FLINK-13847: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Update release scripts to also update docs/_config.yml > -- > > Key: FLINK-13847 > URL: https://issues.apache.org/jira/browse/FLINK-13847 > Project: Flink > Issue Type: Improvement > Components: Documentation, Release System >Reporter: Tzu-Li (Gordon) Tai >Priority: Major > Labels: stale-major > > During the 1.9.0 release process, we missed quite a few configuration updates > in {{docs/_config.yml}} related to Flink versions. This should be able to be > done automatically in via the release scripts. > A list of settings in that file that needs to be touched on every major > release include: > * version > * version_title > * github_branch > * baseurl > * stable_baseurl > * javadocs_baseurl > * pythondocs_baseurl > * is_stable > * Add new link to previous_docs > This can probably be done via the > {{tools/releasing/create_release_branch.sh}} script, which is used for every > major release. > We should also update the release guide in the project wiki to cover checking > that file as an item in checklists. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-2838) Inconsistent use of URL and Path classes for resources
[ https://issues.apache.org/jira/browse/FLINK-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336904#comment-17336904 ] Flink Jira Bot commented on FLINK-2838: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Inconsistent use of URL and Path classes for resources > -- > > Key: FLINK-2838 > URL: https://issues.apache.org/jira/browse/FLINK-2838 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 0.10.0 >Reporter: Maximilian Michels >Priority: Major > Labels: stale-major > > Flink uses either Path or URL to point to resources. For example, > {{JobGraph.addJar}} expects a Path while {{JobWithJars}} expects URLs for > JARs. > The URL class requires an explicit file schema (e.g. file://, hdfs://) while > the Path class expects any kind of well-formed paths. We should make clear > which one to use for resources. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-4947) Make all configuration possible via flink-conf.yaml and CLI.
[ https://issues.apache.org/jira/browse/FLINK-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336841#comment-17336841 ] Flink Jira Bot commented on FLINK-4947: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Make all configuration possible via flink-conf.yaml and CLI. > > > Key: FLINK-4947 > URL: https://issues.apache.org/jira/browse/FLINK-4947 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Jamie Grier >Priority: Major > Labels: stale-major > > I think it's important to make all configuration possible via the > flink-conf.yaml and the command line. > As an example: To enable "externalizedCheckpoints" you must actually call > the StreamExecutionEnvironment#enableExternalizedCheckpoints() method from > your Flink program. > Another example of this would be configuring the RocksDB state backend. > I think it important to make deployment flexible and easy to build tools > around. For example, the infrastructure teams that make these configuration > decisions and provide tools for deploying Flink apps, will be different from > the teams deploying apps. The team writing apps should not have to set all > of this lower level configuration up in their programs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-3642) Disentangle ExecutionConfig
[ https://issues.apache.org/jira/browse/FLINK-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336883#comment-17336883 ] Flink Jira Bot commented on FLINK-3642: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Disentangle ExecutionConfig > --- > > Key: FLINK-3642 > URL: https://issues.apache.org/jira/browse/FLINK-3642 > Project: Flink > Issue Type: Improvement > Components: API / DataSet, API / DataStream >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Priority: Major > Labels: stale-major > > Initially, the {{ExecutionConfig}} started out being a configuration to > configure the behaviour of the system with respect to the associated job. As > such it stored information about the restart strategy, registered types and > the parallelism of the job. However, it happened that the {{ExecutionConfig}} > has become more of an easy entry-point to pass information into the system. > As such, the user can now set arbitrary information as part of the > {{GlobalJobParameters}} in the {{ExecutionConfig}} which is piped to all > kinds of different locations in the system, e.g. the serializers, JM, > ExecutionGraph, TM, etc. > This mixture of user code classes with system parameters makes it really > cumbersome to send system information around, because you always need a user > code class loader to deserialize it. Furthermore, there are different means > how the {{ExecutionConfig}} is passed to the system. One is giving it to the > {{Serializers}} created in the JavaAPIPostPass and another is giving it > directly to the {{JobGraph}}, for example. The problem is that the > {{ExecutionConfig}} contains information which is required at different > stages of a program execution. > I think it would be beneficial to disentangle the {{ExecutionConfig}} a > little bit along the lines of the different concerns for which the > {{ExecutionConfig}} is used currently. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-10616) Jepsen test fails while tearing down Hadoop
[ https://issues.apache.org/jira/browse/FLINK-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-10616: --- Priority: Minor (was: Major) > Jepsen test fails while tearing down Hadoop > --- > > Key: FLINK-10616 > URL: https://issues.apache.org/jira/browse/FLINK-10616 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.6.1, 1.7.0 >Reporter: Gary Yao >Priority: Minor > Labels: auto-deprioritized-major > > While tearing down Hadoop, the tests sporadically fail with the exception > below: > {noformat} > Caused by: java.lang.RuntimeException: sudo -S -u root bash -c "cd /; ps aux > | grep hadoop | grep -v grep | awk \"\{print \\\$2}\" | xargs kill -9" > returned non-zero exit status 123 on 172.31.39.235. STDOUT: > STDERR: > at jepsen.control$throw_on_nonzero_exit.invokeStatic(control.clj:129) > ~[jepsen-0.1.10.jar:na] > at jepsen.control$throw_on_nonzero_exit.invoke(control.clj:122) > ~[jepsen-0.1.10.jar:na] > at jepsen.control$exec_STAR_.invokeStatic(control.clj:166) > ~[jepsen-0.1.10.jar:na] > at jepsen.control$exec_STAR_.doInvoke(control.clj:163) > ~[jepsen-0.1.10.jar:na] > at clojure.lang.RestFn.applyTo(RestFn.java:137) [clojure-1.9.0.jar:na] > at clojure.core$apply.invokeStatic(core.clj:657) > ~[clojure-1.9.0.jar:na] > at clojure.core$apply.invoke(core.clj:652) ~[clojure-1.9.0.jar:na] > at jepsen.control$exec.invokeStatic(control.clj:182) > ~[jepsen-0.1.10.jar:na] > at jepsen.control$exec.doInvoke(control.clj:176) > ~[jepsen-0.1.10.jar:na] > at clojure.lang.RestFn.invoke(RestFn.java:2088) [clojure-1.9.0.jar:na] > at jepsen.control.util$grepkill_BANG_.invokeStatic(util.clj:197) > ~[classes/:na] > at jepsen.control.util$grepkill_BANG_.invoke(util.clj:191) > ~[classes/:na] > at jepsen.control.util$grepkill_BANG_.invokeStatic(util.clj:194) > ~[classes/:na] > at jepsen.control.util$grepkill_BANG_.invoke(util.clj:191) > ~[classes/:na] > at jepsen.flink.hadoop$db$reify__3102.teardown_BANG_(hadoop.clj:128) > ~[classes/:na] > at jepsen.flink.db$combined_db$reify__217$fn__220.invoke(db.clj:119) > ~[na:na] > at clojure.core$map$fn__5587.invoke(core.clj:2745) > ~[clojure-1.9.0.jar:na] > at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.9.0.jar:na] > at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.9.0.jar:na] > at clojure.lang.RT.seq(RT.java:528) ~[clojure-1.9.0.jar:na] > at clojure.core$seq__5124.invokeStatic(core.clj:137) > ~[clojure-1.9.0.jar:na] > at clojure.core$dorun.invokeStatic(core.clj:3125) > ~[clojure-1.9.0.jar:na] > at clojure.core$doall.invokeStatic(core.clj:3140) > ~[clojure-1.9.0.jar:na] > at clojure.core$doall.invoke(core.clj:3140) ~[clojure-1.9.0.jar:na] > at jepsen.flink.db$combined_db$reify__217.teardown_BANG_(db.clj:119) > ~[na:na] > at jepsen.db$fn__2137$G__2133__2141.invoke(db.clj:8) > ~[jepsen-0.1.10.jar:na] > at jepsen.db$fn__2137$G__2132__2146.invoke(db.clj:8) > ~[jepsen-0.1.10.jar:na] > at clojure.core$partial$fn__5561.invoke(core.clj:2617) > ~[clojure-1.9.0.jar:na] > at jepsen.control$on_nodes$fn__2116.invoke(control.clj:372) > ~[jepsen-0.1.10.jar:na] > at clojure.lang.AFn.applyToHelper(AFn.java:154) > ~[clojure-1.9.0.jar:na] > at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.9.0.jar:na] > at clojure.core$apply.invokeStatic(core.clj:657) > ~[clojure-1.9.0.jar:na] > at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1965) > ~[clojure-1.9.0.jar:na] > at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1965) > ~[clojure-1.9.0.jar:na] > at clojure.lang.RestFn.applyTo(RestFn.java:142) [clojure-1.9.0.jar:na] > at clojure.core$apply.invokeStatic(core.clj:661) > ~[clojure-1.9.0.jar:na] > at clojure.core$bound_fn_STAR_$fn__5471.doInvoke(core.clj:1995) > ~[clojure-1.9.0.jar:na] > at clojure.lang.RestFn.invoke(RestFn.java:408) [clojure-1.9.0.jar:na] > at jepsen.util$real_pmap$launcher__1168$fn__1169.invoke(util.clj:49) > ~[jepsen-0.1.10.jar:na] > at clojure.core$binding_conveyor_fn$fn__5476.invoke(core.clj:2022) > ~[clojure-1.9.0.jar:na] > at clojure.lang.AFn.call(AFn.java:18) ~[clojure-1.9.0.jar:na] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_171] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[na:1.8.0_171] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[na:1.8.0_171] > at
[jira] [Commented] (FLINK-11942) Flink kinesis connector throws kinesis producer daemon fatalError
[ https://issues.apache.org/jira/browse/FLINK-11942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336507#comment-17336507 ] Flink Jira Bot commented on FLINK-11942: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Flink kinesis connector throws kinesis producer daemon fatalError > - > > Key: FLINK-11942 > URL: https://issues.apache.org/jira/browse/FLINK-11942 > Project: Flink > Issue Type: Bug > Components: Connectors / Kinesis >Affects Versions: 1.7.2 >Reporter: indraneel r >Priority: Major > Labels: stale-major > > Flink connector crashes repeatedly with following error: > {quote}437062 [kpl-callback-pool-28-thread-0] WARN > org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer - An > exception occurred while processing a record > java.lang.RuntimeException: Unexpected error > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:533) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:513) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon.add(Daemon.java:183) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:536) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) > at > org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) > at > com.myorg.bi.web.sessionization.windowing.SessionProcessingFunction$$anonfun$process$2.apply(SessionProcessingFunction.scala:37) > at > com.myorg.bi.web.sessionization.windowing.SessionProcessingFunction$$anonfun$process$2.apply(SessionProcessingFunction.scala:33) > at scala.collection.immutable.Stream.foreach(Stream.scala:594) > at > com.myorg.bi.web.sessionization.windowing.SessionProcessingFunction.process(SessionProcessingFunction.scala:33) > at > com.myorg.bi.web.sessionization.windowing.SessionProcessingFunction.process(SessionProcessingFunction.scala:13) > at > org.apache.flink.streaming.api.scala.function.util.ScalaProcessWindowFunctionWrapper.process(ScalaProcessWindowFunctionWrapper.scala:63) > at > org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:50) > at > org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:32) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:546) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onEventTime(WindowOperator.java:454) > at > org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.advanceWatermark(InternalTimerServiceImpl.java:251) > at > org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:128) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:775) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:262) > at > org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189) > at > org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:184) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) > at
[jira] [Commented] (FLINK-13695) Integrate checkpoint notifications into StreamTask's lifecycle
[ https://issues.apache.org/jira/browse/FLINK-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336972#comment-17336972 ] Flink Jira Bot commented on FLINK-13695: This issue was labeled "stale-critical" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Critical, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Integrate checkpoint notifications into StreamTask's lifecycle > -- > > Key: FLINK-13695 > URL: https://issues.apache.org/jira/browse/FLINK-13695 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 1.9.0, 1.10.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: stale-critical > > The {{StreamTask's}} asynchronous checkpoint notifications are decoupled from > the {{StreamTask's}} lifecycle. Consequently, it can happen that a > {{StreamTask}} is terminating/cancelling and still sends asynchronous > checkpoint notifications (e.g. acknowledge/decline checkpoint notifications). > This is problematic because a cancelling/terminating {{StreamTask}} might > cause an asynchronous checkpoint to fail which is expected and should not be > reported to the {{JobMaster}}. > Hence, the checkpoint notifications should be coupled with the > {{StreamTask's}} lifecycle (e.g. notifications must be sent from the > {{StreamTask's}} main thread) and if not valid, then they need to be filtered > out/suppressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-10297) PostVersionedIOReadableWritable ignores result of InputStream.read(...)
[ https://issues.apache.org/jira/browse/FLINK-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336980#comment-17336980 ] Flink Jira Bot commented on FLINK-10297: This issue was labeled "stale-critical" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Critical, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > PostVersionedIOReadableWritable ignores result of InputStream.read(...) > --- > > Key: FLINK-10297 > URL: https://issues.apache.org/jira/browse/FLINK-10297 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.4.2, 1.5.3, 1.6.0 >Reporter: Stefan Richter >Priority: Critical > Labels: stale-critical > > PostVersionedIOReadableWritable ignores result of {{InputStream.read(...)}}. > Probably the intention was to invoke {{readFully}}. As it is now, this can > lead to a corrupted deserialization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-10409) Collection data sink does not propagate exceptions
[ https://issues.apache.org/jira/browse/FLINK-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-10409: --- Labels: auto-deprioritized-major (was: stale-major) > Collection data sink does not propagate exceptions > -- > > Key: FLINK-10409 > URL: https://issues.apache.org/jira/browse/FLINK-10409 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Dawid Wysakowicz >Priority: Major > Labels: auto-deprioritized-major > > I would assume that this test should fail with {{RuntimeException}}, but it > actually runs just fine. > {code} > @Test > public void testA() throws Exception { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > List resultList = new ArrayList<>(); > SingleOutputStreamOperator result = > env.fromElements("A").map(obj -> { > throw new RuntimeException(); > }); > DataStreamUtils.collect(result).forEachRemaining(resultList::add); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-3240) Remove or document DataStream(.global|.forward)
[ https://issues.apache.org/jira/browse/FLINK-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-3240: -- Labels: auto-deprioritized-major (was: stale-major) > Remove or document DataStream(.global|.forward) > --- > > Key: FLINK-3240 > URL: https://issues.apache.org/jira/browse/FLINK-3240 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Robert Metzger >Priority: Major > Labels: auto-deprioritized-major > > It seems that DataStream.global() and DataStream.forward() are not documented. > From the JavaDocs, I don't really get why we need them. > For DataStream.global(), users can just set the parallelism of the following > operator to p=1. > Forward is the default behavior anyways. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-6814) Store information about whether or not a registered state is queryable in checkpoints
[ https://issues.apache.org/jira/browse/FLINK-6814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336742#comment-17336742 ] Flink Jira Bot commented on FLINK-6814: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Store information about whether or not a registered state is queryable in > checkpoints > - > > Key: FLINK-6814 > URL: https://issues.apache.org/jira/browse/FLINK-6814 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing, Runtime / Queryable State >Reporter: Tzu-Li (Gordon) Tai >Priority: Major > Labels: stale-major > > Currently, we store almost all information that comes with the registered > state's {{StateDescriptor}}s (state name, state serializer, etc.) in > checkpoints, except from information about whether or not the state is > queryable. I propose to add that also. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-11402) User code can fail with an UnsatisfiedLinkError in the presence of multiple classloaders
[ https://issues.apache.org/jira/browse/FLINK-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336529#comment-17336529 ] Flink Jira Bot commented on FLINK-11402: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > User code can fail with an UnsatisfiedLinkError in the presence of multiple > classloaders > > > Key: FLINK-11402 > URL: https://issues.apache.org/jira/browse/FLINK-11402 > Project: Flink > Issue Type: Improvement > Components: Documentation, Runtime / Task >Affects Versions: 1.7.0 >Reporter: Ufuk Celebi >Priority: Major > Labels: stale-major, starter > Attachments: hello-snappy-1.0-SNAPSHOT.jar, hello-snappy.tgz > > > As reported on the user mailing list thread "[`env.java.opts` not persisting > after job canceled or failed and then > restarted|https://lists.apache.org/thread.html/37cc1b628e16ca6c0bacced5e825de8057f88a8d601b90a355b6a291@%3Cuser.flink.apache.org%3E];, > there can be issues with using native libraries and user code class loading. > h2. Steps to reproduce > I was able to reproduce the issue reported on the mailing list using > [snappy-java|https://github.com/xerial/snappy-java] in a user program. > Running the attached user program works fine on initial submission, but > results in a failure when re-executed. > I'm using Flink 1.7.0 using a standalone cluster started via > {{bin/start-cluster.sh}}. > 0. Unpack attached Maven project and build using {{mvn clean package}} *or* > directly use attached {{hello-snappy-1.0-SNAPSHOT.jar}} > 1. Download > [snappy-java-1.1.7.2.jar|http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.2/snappy-java-1.1.7.2.jar] > and unpack libsnappyjava for your system: > {code} > jar tf snappy-java-1.1.7.2.jar | grep libsnappy > ... > org/xerial/snappy/native/Linux/x86_64/libsnappyjava.so > ... > org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib > ... > {code} > 2. Configure system library path to {{libsnappyjava}} in {{flink-conf.yaml}} > (path needs to be adjusted for your system): > {code} > env.java.opts: -Djava.library.path=/.../org/xerial/snappy/native/Mac/x86_64 > {code} > 3. Run attached {{hello-snappy-1.0-SNAPSHOT.jar}} > {code} > bin/flink run hello-snappy-1.0-SNAPSHOT.jar > Starting execution of program > Program execution finished > Job with JobID ae815b918dd7bc64ac8959e4e224f2b4 has finished. > Job Runtime: 359 ms > {code} > 4. Rerun attached {{hello-snappy-1.0-SNAPSHOT.jar}} > {code} > bin/flink run hello-snappy-1.0-SNAPSHOT.jar > Starting execution of program > > The program finished with the following exception: > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: 7d69baca58f33180cb9251449ddcd396) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487) > at > org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) > at com.github.uce.HelloSnappy.main(HelloSnappy.java:18) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813) > at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126) > at > org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at >
[jira] [Commented] (FLINK-12118) Documentation in left navigation is not in step with its content
[ https://issues.apache.org/jira/browse/FLINK-12118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336498#comment-17336498 ] Flink Jira Bot commented on FLINK-12118: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Documentation in left navigation is not in step with its content > > > Key: FLINK-12118 > URL: https://issues.apache.org/jira/browse/FLINK-12118 > Project: Flink > Issue Type: Bug > Components: Documentation >Reporter: Yu Haidong >Priority: Major > Labels: stale-major > Attachments: 图片 11.png, 图片 12.png > > > Hi Team: > A little bug I think. > The content in offical website documentation of left navigation is : > "1.8(snapshot)", > but when you click it, you will see : > "1.9-SNAPSHOT" > in the content, maybe it is a little bug. > Thank you! > > Haidong > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-2197) Scala API is not working when using batch and streaming API in the same program
[ https://issues.apache.org/jira/browse/FLINK-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-2197: -- Labels: auto-deprioritized-major (was: stale-major) > Scala API is not working when using batch and streaming API in the same > program > --- > > Key: FLINK-2197 > URL: https://issues.apache.org/jira/browse/FLINK-2197 > Project: Flink > Issue Type: Bug > Components: API / DataSet, API / DataStream, API / Scala >Reporter: Till Rohrmann >Priority: Major > Labels: auto-deprioritized-major > > If one uses the the Scala batch and streaming API from within the same > program and imports both corresponding package objects, then the Scala API no > longer works because it is lacking the implicit {{TypeInformation}} values. > The reason for this is that both package objects contain an implicit function > {{createTypeInformation}}. This creates an ambiguity which is not possible > for the Scala compiler to resolve. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-13113) Introduce range partition in blink
[ https://issues.apache.org/jira/browse/FLINK-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-13113: --- Priority: Minor (was: Major) > Introduce range partition in blink > -- > > Key: FLINK-13113 > URL: https://issues.apache.org/jira/browse/FLINK-13113 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Jingsong Lee >Priority: Minor > Labels: auto-deprioritized-major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5351) Make the TypeExtractor support functions with more than 2 inputs
[ https://issues.apache.org/jira/browse/FLINK-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-5351: -- Priority: Minor (was: Major) > Make the TypeExtractor support functions with more than 2 inputs > > > Key: FLINK-5351 > URL: https://issues.apache.org/jira/browse/FLINK-5351 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System, Library / Graph > Processing (Gelly) >Reporter: Vasia Kalavri >Priority: Minor > Labels: auto-deprioritized-major > > Currently, the The TypeExtractor doesn't support functions with more than 2 > inputs. We found that adding such support would be a useful feature for Gelly > in FLINK-5097. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-6626) Unifying lifecycle management of SubmittedJobGraph- and CompletedCheckpointStore
[ https://issues.apache.org/jira/browse/FLINK-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336749#comment-17336749 ] Flink Jira Bot commented on FLINK-6626: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Unifying lifecycle management of SubmittedJobGraph- and > CompletedCheckpointStore > > > Key: FLINK-6626 > URL: https://issues.apache.org/jira/browse/FLINK-6626 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Priority: Major > Labels: stale-major > > Currently, Flink uses the {{SubmittedJobGraphStore}} to persist {{JobGraphs}} > such that they can be recovered in case of failures. The > {{SubmittedJobGraphStore}} is managed by by the {{JobManager}}. Additionally, > Flink has the {{CompletedCheckpointStore}} which stores checkpoints for a > given {{ExecutionGraph}}/job. The {{CompletedCheckpointStore}} is managed by > the {{CheckpointCoordinator}}. > The {{SubmittedJobGraphStore}} and the {{CompletedCheckpointStore}} are > somewhat related because in the latter we store checkpoints for jobs > contained in the former. I think it would be nice wrt lifecycle management to > let the {{SubmittedJobGraphStore}} manage the lifecycle of the > {{CompletedCheckpointStore}}, because often it does not make much sense to > keep only checkpoints without a job or a job without checkpoints. > An idea would be when we register a job with the {{SubmittedJobGraphStore}} > then it returns a {{CompletedCheckpointStore}}. This store can then be given > to the {{CheckpointCoordinator}} to store the checkpoints. When a job enters > a terminal state it could be the responsibility of the > {{SubmittedJobGraphStore}} to decide what to do with the job data > ({{JobGraph}} and {{Checkpoints}}), e.g. keeping it or cleaning it up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-10106) Include test name in temp directory of e2e test
[ https://issues.apache.org/jira/browse/FLINK-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336597#comment-17336597 ] Flink Jira Bot commented on FLINK-10106: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Include test name in temp directory of e2e test > --- > > Key: FLINK-10106 > URL: https://issues.apache.org/jira/browse/FLINK-10106 > Project: Flink > Issue Type: Improvement > Components: Tests >Affects Versions: 1.6.0 >Reporter: Till Rohrmann >Priority: Major > Labels: pull-request-available, stale-major > Time Spent: 10m > Remaining Estimate: 0h > > For better debuggability it would help to include the name of the e2e test in > the created temporary testing directory > {{temp-test-directory--UUID}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-9413) Tasks can fail with PartitionNotFoundException if consumer deployment takes too long
[ https://issues.apache.org/jira/browse/FLINK-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336983#comment-17336983 ] Flink Jira Bot commented on FLINK-9413: --- This issue was labeled "stale-critical" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Critical, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Tasks can fail with PartitionNotFoundException if consumer deployment takes > too long > > > Key: FLINK-9413 > URL: https://issues.apache.org/jira/browse/FLINK-9413 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.4.0, 1.5.0, 1.6.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: pull-request-available, stale-critical > Time Spent: 10m > Remaining Estimate: 0h > > {{Tasks}} can fail with a {{PartitionNotFoundException}} if the deployment of > the producer takes too long. More specifically, if it takes longer than the > {{taskmanager.network.request-backoff.max}}, then the {{Task}} will give up > and fail. > The problem is that we calculate the {{InputGateDeploymentDescriptor}} for a > consuming task once the producer has been assigned a slot but we do not wait > until it is actually running. The problem should be fixed if we wait until > the task is in state {{RUNNING}} before assigning the result partition to the > consumer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-8379) Improve type checking for DataView
[ https://issues.apache.org/jira/browse/FLINK-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336671#comment-17336671 ] Flink Jira Bot commented on FLINK-8379: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Improve type checking for DataView > -- > > Key: FLINK-8379 > URL: https://issues.apache.org/jira/browse/FLINK-8379 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Reporter: Timo Walther >Priority: Major > Labels: stale-major > > At the moment an accumulator with no proper type information is a valid > accumulator. > {code} > public static class CountDistinctAccum { > public MapView map; > public long count; > } > {code} > I quickly looked into the code and it seems that MapView with type > information for key and value can be null. We should add a null check at the > correct position to inform the user about the non-existing type information. > We should also add the type check added with FLINK-8139 for the key type of > MapView. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-16616) Drop BucketingSink
[ https://issues.apache.org/jira/browse/FLINK-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-16616: --- Labels: auto-deprioritized-major (was: stale-major) > Drop BucketingSink > -- > > Key: FLINK-16616 > URL: https://issues.apache.org/jira/browse/FLINK-16616 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem >Reporter: Robert Metzger >Priority: Major > Labels: auto-deprioritized-major > > (See this discussion for context: > https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E) > > The bucketing sink has been deprecated in the 1.9 release [2], because we > have the new StreamingFileSink [3] for quite a while. > *The purpose of this ticket is to track all dependent tickets: If you think > something needs to be implemented before we can drop the BucketingSink, add > it as a "depends on" ticket* > [2] https://issues.apache.org/jira/browse/FLINK-13396 > [3] https://issues.apache.org/jira/browse/FLINK-9749 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-4902) Flink Task Chain not getting input in a distributed manner
[ https://issues.apache.org/jira/browse/FLINK-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-4902: -- Labels: auto-deprioritized-major (was: stale-major) > Flink Task Chain not getting input in a distributed manner > -- > > Key: FLINK-4902 > URL: https://issues.apache.org/jira/browse/FLINK-4902 > Project: Flink > Issue Type: Bug > Components: API / DataSet >Affects Versions: 1.1.0 > Environment: RHEL 6.6 >Reporter: Sajeev Ramakrishnan >Priority: Major > Labels: auto-deprioritized-major > > Dear Team, > I have the following tasks chained as a single subtask. > left outer join -> filter -> map -> flatMap. > The input to this would be two streams > memberPlan - 22 million > groupPlan - 1 million. > I am running the entire job with parallelism 16. Before this task chain, I am > doing two left outer joins. > The problem is that one slot is getting 22 plus million (includes some from > groupPlan) and rest 15 slots are getting the input from groupPlan. > This is making the entire execution very slow, probably 4 hours slower. > Can you please throw some light on this. > Regards, > Sajeev -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-6272) Rolling file sink saves incomplete lines on failure
[ https://issues.apache.org/jira/browse/FLINK-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-6272: -- Labels: auto-deprioritized-major (was: stale-major) > Rolling file sink saves incomplete lines on failure > --- > > Key: FLINK-6272 > URL: https://issues.apache.org/jira/browse/FLINK-6272 > Project: Flink > Issue Type: Bug > Components: Connectors / Common, Connectors / FileSystem >Affects Versions: 1.2.0 > Environment: Flink 1.2.0, Scala 2.11, Debian GNU/Linux 8.7 (jessie), > CDH 5.8, YARN >Reporter: Jakub Nowacki >Priority: Major > Labels: auto-deprioritized-major > > We have simple pipeline with Kafka source (0.9), which transforms data and > writes to Rolling File Sink, which runs on YARN. The sink is a plain HDFS > sink with StringWriter configured as follows: > {code:java} > val fileSink = new BucketingSink[String]("some_path") > fileSink.setBucketer(new DateTimeBucketer[String]("-MM-dd")) > fileSink.setWriter(new StringWriter()) > fileSink.setBatchSize(1024 * 1024 * 1024) // this is 1 GB > {code} > Checkpoint is on. Both Kafka source and File sink are in theory with > [exactly-once > guarantee|https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html]. > On failure in some files, which seem to be complete (not {{in_progress}} > files ore something, but under 1 GB and confirmed to be created on failure), > it comes out that the last line is cut. In our case it shows because we save > the data in line-by-line JSON and this creates invalid JSON line. This does > not happen always when the but I noticed at least 3 incidents like that at > least. > Also, I am not sure if it is a separate bug but we see some data duplication > in this case coming from Kafka. I.e.after the pipeline is restarted some > number of messages come out from Kafka source, which already have been saved > in the previous file. We can check that the messages are duplicated as they > have same data but different timestamp, which is added within Flink pipeline. > This should not happen in theory as the sink and source have [exactly-once > guarantee|https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-3983) Allow users to set any (relevant) configuration parameter of the KinesisProducerConfiguration
[ https://issues.apache.org/jira/browse/FLINK-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-3983: -- Priority: Minor (was: Major) > Allow users to set any (relevant) configuration parameter of the > KinesisProducerConfiguration > - > > Key: FLINK-3983 > URL: https://issues.apache.org/jira/browse/FLINK-3983 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common, Connectors / Kinesis >Affects Versions: 1.1.0 >Reporter: Robert Metzger >Priority: Minor > Labels: auto-deprioritized-major > > Currently, users can only set some of the configuration parameters in the > {{KinesisProducerConfiguration}} through Properties. > It would be good to introduce configuration keys for these keys so that users > can change the producer configuration. > I think these and most of the other variables in the > KinesisProducerConfiguration should be exposed via properties: > - aggregationEnabled > - collectionMaxCount > - collectionMaxSize > - connectTimeout > - credentialsRefreshDelay > - failIfThrottled > - logLevel > - metricsGranularity > - metricsLevel > - metricsNamespace > - metricsUploadDelay -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-10224) Web frontend does not handle scientific notation properly
[ https://issues.apache.org/jira/browse/FLINK-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336588#comment-17336588 ] Flink Jira Bot commented on FLINK-10224: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Web frontend does not handle scientific notation properly > - > > Key: FLINK-10224 > URL: https://issues.apache.org/jira/browse/FLINK-10224 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Affects Versions: 1.5.3 >Reporter: Paul Lin >Priority: Major > Labels: stale-major > Attachments: task metrics panel.png > > > Task metrics can be large numbers which would be converted into scientific > notation by REST server, but the web frontend does not handle them properly. > For instance, below is a panel for Kafka consumer metric "records-lag-max", > which is actually "9.1839984E7". > !task metrics panel.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-8107) UNNEST causes cyclic type checking exception
[ https://issues.apache.org/jira/browse/FLINK-8107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-8107: -- Priority: Minor (was: Major) > UNNEST causes cyclic type checking exception > > > Key: FLINK-8107 > URL: https://issues.apache.org/jira/browse/FLINK-8107 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Affects Versions: 1.4.0 >Reporter: Timo Walther >Priority: Minor > Labels: auto-deprioritized-major > > The following query causes an assertion error: > {code} > def main(args: Array[String]): Unit = { > val env = ExecutionEnvironment.getExecutionEnvironment > val tEnv = TableEnvironment.getTableEnvironment(env) > val input2 = env.fromElements( > WC("hello", 1, Array(1, 2, 3)), > WC("hello", 1, Array(1, 2, 3)), > WC("ciao", 1, Array(1, 2, 3)) > ) > tEnv.registerDataSet("entity", input2) > tEnv.registerDataSet("product", input2, 'product) > val table = tEnv.sqlQuery("SELECT t.item.* FROM product, > UNNEST(entity.myarr) AS t (item)") > table.toDataSet[Row].print() > } > case class WC(word: String, frequency: Long, myarr: Array[Int]) > {code} > It leads to: > {code} > Exception in thread "main" java.lang.AssertionError: Cycle detected during > type-checking > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:93) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:945) > at > org.apache.calcite.sql.validate.AbstractNamespace.getRowType(AbstractNamespace.java:115) > at > org.apache.calcite.sql.validate.AbstractNamespace.getRowTypeSansSystemColumns(AbstractNamespace.java:122) > at > org.apache.calcite.sql.validate.AliasNamespace.validateImpl(AliasNamespace.java:71) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:945) > at > org.apache.calcite.sql.validate.AbstractNamespace.getRowType(AbstractNamespace.java:115) > at > org.apache.calcite.sql.validate.AliasNamespace.getRowType(AliasNamespace.java:41) > at > org.apache.calcite.sql.validate.DelegatingScope.resolveInNamespace(DelegatingScope.java:101) > at org.apache.calcite.sql.validate.ListScope.resolve(ListScope.java:191) > at > org.apache.calcite.sql.validate.ListScope.findQualifyingTableNames(ListScope.java:156) > at > org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:326) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateIdentifier(SqlValidatorImpl.java:2785) > at > org.apache.calcite.sql.SqlIdentifier.validateExpr(SqlIdentifier.java:324) > at org.apache.calcite.sql.SqlOperator.validateCall(SqlOperator.java:407) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateCall(SqlValidatorImpl.java:5084) > at > org.apache.calcite.sql.validate.UnnestNamespace.validateImpl(UnnestNamespace.java:52) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:945) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:926) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:2961) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:2946) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateJoin(SqlValidatorImpl.java:2998) > at > org.apache.flink.table.calcite.FlinkCalciteSqlValidator.validateJoin(FlinkCalciteSqlValidator.scala:67) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:2955) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3206) > at > org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:945) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:926) > at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:226) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:901) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:611) > at >
[jira] [Commented] (FLINK-9689) Flink consumer deserialization example
[ https://issues.apache.org/jira/browse/FLINK-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336616#comment-17336616 ] Flink Jira Bot commented on FLINK-9689: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Flink consumer deserialization example > -- > > Key: FLINK-9689 > URL: https://issues.apache.org/jira/browse/FLINK-9689 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Reporter: Satheesh >Priority: Major > Labels: stale-major > > Its hard to find relevant custom deserialization example for Flink Kafka > consumer. It will be much useful to add a sample program for implementing > custom deserialization in the blink-examples folder. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9469) Add tests that cover PatternStream#flatSelect
[ https://issues.apache.org/jira/browse/FLINK-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-9469: -- Labels: auto-deprioritized-major pull-request-available (was: pull-request-available stale-major) > Add tests that cover PatternStream#flatSelect > - > > Key: FLINK-9469 > URL: https://issues.apache.org/jira/browse/FLINK-9469 > Project: Flink > Issue Type: Improvement > Components: Library / CEP >Reporter: Dawid Wysakowicz >Priority: Major > Labels: auto-deprioritized-major, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-8294) Missing examples/links in Data Sink docs
[ https://issues.apache.org/jira/browse/FLINK-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-8294: -- Priority: Minor (was: Major) > Missing examples/links in Data Sink docs > > > Key: FLINK-8294 > URL: https://issues.apache.org/jira/browse/FLINK-8294 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.4.0 >Reporter: Julio Biason >Priority: Minor > Labels: auto-deprioritized-major > > In the [Data > Sink|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/datastream_api.html#data-sinks] > documentation, there is no example on how to use said functions -- even if > they are only intent for debugging (which is exactly what I want to do right > now). > While {{print}} is quite simple, what I need is to get the resulting > processing, so I'd probably need some of the {{write}} functions (since > FLINK-8285 mentions that iterators are out). > I'd either suggest adding the examples or link the listed functions to their > documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes
[ https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-4785: -- Labels: auto-deprioritized-major csv (was: csv stale-major) > Flink string parser doesn't handle string fields containing two consecutive > double quotes > - > > Key: FLINK-4785 > URL: https://issues.apache.org/jira/browse/FLINK-4785 > Project: Flink > Issue Type: Improvement > Components: API / DataSet >Affects Versions: 1.1.2 >Reporter: Flavio Pompermaier >Priority: Major > Labels: auto-deprioritized-major, csv > > To reproduce the error run > https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/flink/Csv2RowExample.java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-3984) Event time of stream transformations is undocumented
[ https://issues.apache.org/jira/browse/FLINK-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-3984: -- Priority: Minor (was: Major) > Event time of stream transformations is undocumented > > > Key: FLINK-3984 > URL: https://issues.apache.org/jira/browse/FLINK-3984 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Documentation >Affects Versions: 1.0.3 >Reporter: Elias Levy >Priority: Minor > Labels: auto-deprioritized-major > > The Event Time, Windowing, and DataStream Transformation documentation > section fail to state what event time, if any, the output of transformations > have on a stream that is configured to use event time and that has timestamp > assigners. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-6866) ClosureCleaner.clean fails for scala's JavaConverters wrapper classes
[ https://issues.apache.org/jira/browse/FLINK-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-6866: -- Priority: Minor (was: Major) > ClosureCleaner.clean fails for scala's JavaConverters wrapper classes > - > > Key: FLINK-6866 > URL: https://issues.apache.org/jira/browse/FLINK-6866 > Project: Flink > Issue Type: Bug > Components: API / DataStream, API / Scala >Affects Versions: 1.2.0, 1.3.0 > Environment: Scala 2.10.6, Scala 2.11.11 > Does not appear using Scala 2.12 >Reporter: SmedbergM >Priority: Minor > Labels: auto-deprioritized-major > > MWE: https://github.com/SmedbergM/ClosureCleanerBug > MWE console output: > https://gist.github.com/SmedbergM/ce969e6e8540da5b59c7dd921a496dc5 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-5719) Let LatencyMarkers completely bypass operators / chains
[ https://issues.apache.org/jira/browse/FLINK-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336806#comment-17336806 ] Flink Jira Bot commented on FLINK-5719: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Let LatencyMarkers completely bypass operators / chains > --- > > Key: FLINK-5719 > URL: https://issues.apache.org/jira/browse/FLINK-5719 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Tzu-Li (Gordon) Tai >Priority: Major > Labels: stale-major > > Currently, {{LatencyMarker}} s are forwarded through operators via the > operator interfaces and methods, i.e. > {{AbstractStreamOperator#processLatencyMarker()}}, > {{Output#emitLatencyMarker()}}, > {{OneInputStreamOperator#processLatencyMarker()}} etc. > The main issue with this is that {{LatencyMarker}} s are essentially internal > elements, and the implementation on how to handle them should be final. > Exposing them through operator interfaces will allow the user to override the > implementation, and also makes the user interface for operators > over-complicated. > [~aljoscha] suggested to bypass such internal stream elements from the > operator to keep the operator interfaces minimal, in FLINK-5017. > We propose a similar approach here for {{LatencyMarker}} as well. Since the > chaining output calls contribute very little to the measured latency and can > be ignored, instead of passing it through operator chains, latency markers > can simply be passed downstream once tasks receive them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-4001) Add event time support to filesystem connector
[ https://issues.apache.org/jira/browse/FLINK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336866#comment-17336866 ] Flink Jira Bot commented on FLINK-4001: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Add event time support to filesystem connector > -- > > Key: FLINK-4001 > URL: https://issues.apache.org/jira/browse/FLINK-4001 > Project: Flink > Issue Type: New Feature > Components: Connectors / Common, Connectors / FileSystem >Reporter: Robert Metzger >Priority: Major > Labels: stale-major > > Currently, the file system connector (rolling file sink) does not respect the > event time of records. > For full reprocessing capabilities, we need to make the sink aware of the > event time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-7001) Improve performance of Sliding Time Window with pane optimization
[ https://issues.apache.org/jira/browse/FLINK-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336733#comment-17336733 ] Flink Jira Bot commented on FLINK-7001: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Improve performance of Sliding Time Window with pane optimization > - > > Key: FLINK-7001 > URL: https://issues.apache.org/jira/browse/FLINK-7001 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Jark Wu >Priority: Major > Labels: stale-major > > Currently, the implementation of time-based sliding windows treats each > window individually and replicates records to each window. For a window of 10 > minute size that slides by 1 second the data is replicated 600 fold (10 > minutes / 1 second). We can optimize sliding window by divide windows into > panes (aligned with slide), so that we can avoid record duplication and > leverage the checkpoint. > I will attach a more detail design doc to the issue. > The following issues are similar to this issue: FLINK-5387, FLINK-6990 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5094) Support RichReduceFunction and RichFoldFunction as incremental window aggregation functions
[ https://issues.apache.org/jira/browse/FLINK-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-5094: -- Priority: Minor (was: Major) > Support RichReduceFunction and RichFoldFunction as incremental window > aggregation functions > --- > > Key: FLINK-5094 > URL: https://issues.apache.org/jira/browse/FLINK-5094 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Affects Versions: 1.1.3, 1.2.0 >Reporter: Fabian Hueske >Priority: Minor > Labels: auto-deprioritized-major > > Support {{RichReduceFunction}} and {{RichFoldFunction}} as incremental window > aggregation functions in order to initialize the functions via {{open()}}. > The main problem is that we do not want to provide the full power of > {{RichFunction}} for incremental aggregation functions, such as defining own > operator state. This could be achieve by providing some kind of > {{RestrictedRuntimeContext}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-13395) Add source and sink connector for Alibaba Log Service
[ https://issues.apache.org/jira/browse/FLINK-13395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336458#comment-17336458 ] Flink Jira Bot commented on FLINK-13395: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Add source and sink connector for Alibaba Log Service > - > > Key: FLINK-13395 > URL: https://issues.apache.org/jira/browse/FLINK-13395 > Project: Flink > Issue Type: New Feature > Components: Connectors / Common >Reporter: Ke Li >Priority: Major > Labels: stale-major > > Alibaba Log Service is a big data service which has been widely used in > Alibaba Group and thousands of customers of Alibaba Cloud. The core storage > engine of Log Service is named Loghub which is a large scale distributed > storage system which provides producer and consumer to push and pull data > like Kafka, AWS Kinesis and Azure Eventhub does. > Log Service provides a complete solution to help user collect data from both > on premise and cloud data sources. More than 10 PB data is sent to and > consumed from Loghub every day. And hundreds of thousands of users > implemented their DevOPS and big data system based on Log Service. > Log Service and Flink/Blink has became the de facto standard of big data > architecture for unified data processing in Alibaba Group and more users of > Alibaba Cloud. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-7271) ExpressionReducer does not optimize string-to-time conversion
[ https://issues.apache.org/jira/browse/FLINK-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-7271: -- Labels: auto-deprioritized-major (was: stale-major) > ExpressionReducer does not optimize string-to-time conversion > - > > Key: FLINK-7271 > URL: https://issues.apache.org/jira/browse/FLINK-7271 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.3.1 >Reporter: Timo Walther >Priority: Major > Labels: auto-deprioritized-major > > Expressions like {{"1996-11-10".toDate}} or {{"1996-11-10 > 12:12:12".toTimestamp}} are not recognized by the ExpressionReducer and are > evaluated during runtime instead of pre-flight phase. In order to optimize > the runtime we should allow constant expression reduction here. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16881) use Catalog's total size info in planner
[ https://issues.apache.org/jira/browse/FLINK-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336250#comment-17336250 ] Flink Jira Bot commented on FLINK-16881: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > use Catalog's total size info in planner > > > Key: FLINK-16881 > URL: https://issues.apache.org/jira/browse/FLINK-16881 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner >Reporter: godfrey he >Priority: Major > Labels: stale-major > > in some case, {{Catalog}} only contains {{totalSize}} and row count is > unknown. we also can use {{totalSize}} to infer row count, or even use > {{totalSize}} to decide whether the join is broadcast join -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20427) Remove CheckpointConfig.setPreferCheckpointForRecovery because it can lead to data loss
[ https://issues.apache.org/jira/browse/FLINK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-20427: --- Priority: Major (was: Critical) > Remove CheckpointConfig.setPreferCheckpointForRecovery because it can lead to > data loss > --- > > Key: FLINK-20427 > URL: https://issues.apache.org/jira/browse/FLINK-20427 > Project: Flink > Issue Type: Bug > Components: API / DataStream, Runtime / Checkpointing >Affects Versions: 1.12.0 >Reporter: Till Rohrmann >Priority: Major > Labels: auto-deprioritized-critical > Fix For: 1.14.0 > > > The {{CheckpointConfig.setPreferCheckpointForRecovery}} allows to configure > whether Flink prefers checkpoints for recovery if the > {{CompletedCheckpointStore}} contains savepoints and checkpoints. This is > problematic because due to this feature, Flink might prefer older checkpoints > over newer savepoints for recovery. Since some components expect that the > always the latest checkpoint/savepoint is used (e.g. the > {{SourceCoordinator}}), it breaks assumptions and can lead to > {{SourceSplits}} which are not read. This effectively means that the system > loses data. Similarly, this behaviour can cause that exactly once sinks might > output results multiple times which violates the processing guarantees. > Hence, I believe that we should remove this setting because it changes > Flink's behaviour in some very significant way potentially w/o the user > noticing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9123) Scala version of ProcessFunction doesn't work
[ https://issues.apache.org/jira/browse/FLINK-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-9123: -- Priority: Minor (was: Major) > Scala version of ProcessFunction doesn't work > - > > Key: FLINK-9123 > URL: https://issues.apache.org/jira/browse/FLINK-9123 > Project: Flink > Issue Type: Bug > Components: API / DataStream, API / Scala, Documentation >Reporter: Julio Biason >Priority: Minor > Labels: auto-deprioritized-major > > The source code example of ProcessFunction doesn't compile: > > {code:java} > value Context is not a member of object > org.apache.flink.streaming.api.functions.ProcessFunction > [error] import > org.apache.flink.streaming.api.functions.ProcessFunction.Context > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5086) Clean dead snapshot files produced by the tasks failing to acknowledge checkpoints
[ https://issues.apache.org/jira/browse/FLINK-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-5086: -- Labels: auto-deprioritized-major (was: stale-major) > Clean dead snapshot files produced by the tasks failing to acknowledge > checkpoints > -- > > Key: FLINK-5086 > URL: https://issues.apache.org/jira/browse/FLINK-5086 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Reporter: Xiaogang Shi >Priority: Major > Labels: auto-deprioritized-major > > A task may fail when performing checkpoints. In that case, the task may have > already copied some data to external storage. But since the task fails to > send the state handler to {{CheckpointCoordinator}}, the copied data will not > be deleted by {{CheckpointCoordinator}}. > I think we must find a method to clean such dead snapshot data to avoid > unlimited usage of external storage. > One possible method is to clean these dead files when the task recovers. When > a task recovers, {{CheckpointCoordinator}} will tell the task all the > retained checkpoints. The task then can scan the external storage to delete > all the snapshots not in these retained checkpoints. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-12294) Kafka connector, work with grouping partitions
[ https://issues.apache.org/jira/browse/FLINK-12294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-12294: --- Labels: auto-deprioritized-major performance (was: performance stale-major) > Kafka connector, work with grouping partitions > -- > > Key: FLINK-12294 > URL: https://issues.apache.org/jira/browse/FLINK-12294 > Project: Flink > Issue Type: New Feature > Components: API / DataStream, Connectors / Kafka, Runtime / Task >Reporter: Sergey >Priority: Major > Labels: auto-deprioritized-major, performance > Attachments: KeyGroupAssigner.java, KeyGroupRangeAssignment.java > > > Additional flag (with default false value) controlling whether topic > partitions already grouped by the key. Exclude unnecessary shuffle/resorting > operation when this parameter set to true. As an example, say we have > client's payment transaction in a kafka topic. We grouping by clientId > (transaction with the same clientId goes to one kafka topic partition) and > the task is to find max transaction per client in sliding windows. In terms > of map\reduce there is no needs to shuffle data between all topic consumers, > may be it`s worth to do within each consumer to gain some speedup due to > increasing number of executors within each partition data. With N messages > (in partition) instead of N*ln(N) (current realization with > shuffle/resorting) it will be just N operations. For windows with thousands > events - the tenfold gain of execution speed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-11942) Flink kinesis connector throws kinesis producer daemon fatalError
[ https://issues.apache.org/jira/browse/FLINK-11942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11942: --- Priority: Minor (was: Major) > Flink kinesis connector throws kinesis producer daemon fatalError > - > > Key: FLINK-11942 > URL: https://issues.apache.org/jira/browse/FLINK-11942 > Project: Flink > Issue Type: Bug > Components: Connectors / Kinesis >Affects Versions: 1.7.2 >Reporter: indraneel r >Priority: Minor > Labels: auto-deprioritized-major > > Flink connector crashes repeatedly with following error: > {quote}437062 [kpl-callback-pool-28-thread-0] WARN > org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer - An > exception occurred while processing a record > java.lang.RuntimeException: Unexpected error > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:533) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon.fatalError(Daemon.java:513) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon.add(Daemon.java:183) > at > org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:536) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) > at > org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) > at > com.myorg.bi.web.sessionization.windowing.SessionProcessingFunction$$anonfun$process$2.apply(SessionProcessingFunction.scala:37) > at > com.myorg.bi.web.sessionization.windowing.SessionProcessingFunction$$anonfun$process$2.apply(SessionProcessingFunction.scala:33) > at scala.collection.immutable.Stream.foreach(Stream.scala:594) > at > com.myorg.bi.web.sessionization.windowing.SessionProcessingFunction.process(SessionProcessingFunction.scala:33) > at > com.myorg.bi.web.sessionization.windowing.SessionProcessingFunction.process(SessionProcessingFunction.scala:13) > at > org.apache.flink.streaming.api.scala.function.util.ScalaProcessWindowFunctionWrapper.process(ScalaProcessWindowFunctionWrapper.scala:63) > at > org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:50) > at > org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:32) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:546) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onEventTime(WindowOperator.java:454) > at > org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.advanceWatermark(InternalTimerServiceImpl.java:251) > at > org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:128) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:775) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:262) > at > org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189) > at > org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:184) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.InterruptedException > at >
[jira] [Commented] (FLINK-2147) Approximate calculation of frequencies in data streams
[ https://issues.apache.org/jira/browse/FLINK-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336921#comment-17336921 ] Flink Jira Bot commented on FLINK-2147: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Approximate calculation of frequencies in data streams > -- > > Key: FLINK-2147 > URL: https://issues.apache.org/jira/browse/FLINK-2147 > Project: Flink > Issue Type: New Feature > Components: API / DataStream >Reporter: Gábor Gévay >Priority: Major > Labels: approximate, stale-major, statistics > > Count-Min sketch is a hashing-based algorithm for approximately keeping track > of the frequencies of elements in a data stream. It is described by Cormode > et al. in the following paper: > http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf > Note that this algorithm can be conveniently implemented in a distributed > way, as described in section 3.2 of the paper. > The paper > http://www.vldb.org/conf/2002/S10P03.pdf > also describes algorithms for approximately keeping track of frequencies, but > here the user can specify a threshold below which she is not interested in > the frequency of an element. The error-bounds are also different than the > Count-min sketch algorithm. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-8470) DelayTrigger and DelayAndCountTrigger in Flink Streaming Window API
[ https://issues.apache.org/jira/browse/FLINK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336664#comment-17336664 ] Flink Jira Bot commented on FLINK-8470: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > DelayTrigger and DelayAndCountTrigger in Flink Streaming Window API > --- > > Key: FLINK-8470 > URL: https://issues.apache.org/jira/browse/FLINK-8470 > Project: Flink > Issue Type: New Feature > Components: API / DataStream >Affects Versions: 2.0.0 >Reporter: Vijay Kansal >Priority: Major > Labels: stale-major > Original Estimate: 24h > Remaining Estimate: 24h > > In Flink streaming API, we do not have any in-built window trigger(s) > available for the below use cases: > 1. DelayTrigger: Window function should trigger in case the 1st element > belonging to this window arrived more than maxDelay(ms) before the current > processing time. > 2. DelayAndCountTrigger: Window function should trigger in case the 1st > element belonging to this window arrived more than maxDelay(ms) before the > current processing time or there are more than maxCount elements in the > window. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5901) DAG can not show properly in IE
[ https://issues.apache.org/jira/browse/FLINK-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-5901: -- Priority: Major (was: Critical) > DAG can not show properly in IE > --- > > Key: FLINK-5901 > URL: https://issues.apache.org/jira/browse/FLINK-5901 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend > Environment: IE 11 >Reporter: Tao Wang >Priority: Major > Labels: auto-deprioritized-critical > Attachments: using IE.png, using chrom(same job).png > > > The DAG of running jobs can not show properly in IE11(I am using > 11.0.9600.18059, but assuming same with IE9). The description of task is > not shown within the rectangle. > Chrome is well. I pasted the screeshot under IE and Chrome below. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5094) Support RichReduceFunction and RichFoldFunction as incremental window aggregation functions
[ https://issues.apache.org/jira/browse/FLINK-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-5094: -- Labels: auto-deprioritized-major (was: stale-major) > Support RichReduceFunction and RichFoldFunction as incremental window > aggregation functions > --- > > Key: FLINK-5094 > URL: https://issues.apache.org/jira/browse/FLINK-5094 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Affects Versions: 1.1.3, 1.2.0 >Reporter: Fabian Hueske >Priority: Major > Labels: auto-deprioritized-major > > Support {{RichReduceFunction}} and {{RichFoldFunction}} as incremental window > aggregation functions in order to initialize the functions via {{open()}}. > The main problem is that we do not want to provide the full power of > {{RichFunction}} for incremental aggregation functions, such as defining own > operator state. This could be achieve by providing some kind of > {{RestrictedRuntimeContext}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-10031) Support handling late events in Table API & SQL
[ https://issues.apache.org/jira/browse/FLINK-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-10031: --- Priority: Minor (was: Major) > Support handling late events in Table API & SQL > --- > > Key: FLINK-10031 > URL: https://issues.apache.org/jira/browse/FLINK-10031 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Reporter: Timo Walther >Priority: Minor > Labels: auto-deprioritized-major > > The Table API & SQL drop late events right now. We should offer something > like a side channel that allows to capture late events for separate > processing. For example, this could either be simply a table sink or a > special table (in Table API) derived from the original table that allows > further processing. The exact design needs to be discussed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5683) Fix RedisConnector documentation (Java)
[ https://issues.apache.org/jira/browse/FLINK-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-5683: -- Labels: auto-deprioritized-major redis (was: redis stale-major) > Fix RedisConnector documentation (Java) > --- > > Key: FLINK-5683 > URL: https://issues.apache.org/jira/browse/FLINK-5683 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common, Documentation >Affects Versions: 1.1.3, 1.1.4 >Reporter: Vikram Rawat >Priority: Major > Labels: auto-deprioritized-major, redis > > The RedisConnector documentation for Java can be improved. > https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/connectors/redis.html > > The code snippet on the site is: > DataStream stream = ...; > stream.addSink(new RedisSink>(conf, new > RedisExampleMapper()); > This gives compile time error in IDE. It should be changed to: > DataStream> stream = ...; > stream.addSink(new RedisSink>(conf, new > RedisExampleMapper()); > The code snippets on the Project WebSite must be accurate so I recommend this > change. And it will only take a few minutes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-13695) Integrate checkpoint notifications into StreamTask's lifecycle
[ https://issues.apache.org/jira/browse/FLINK-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-13695: --- Priority: Major (was: Critical) > Integrate checkpoint notifications into StreamTask's lifecycle > -- > > Key: FLINK-13695 > URL: https://issues.apache.org/jira/browse/FLINK-13695 > Project: Flink > Issue Type: Bug > Components: Runtime / Task >Affects Versions: 1.9.0, 1.10.0 >Reporter: Till Rohrmann >Priority: Major > Labels: auto-deprioritized-critical > > The {{StreamTask's}} asynchronous checkpoint notifications are decoupled from > the {{StreamTask's}} lifecycle. Consequently, it can happen that a > {{StreamTask}} is terminating/cancelling and still sends asynchronous > checkpoint notifications (e.g. acknowledge/decline checkpoint notifications). > This is problematic because a cancelling/terminating {{StreamTask}} might > cause an asynchronous checkpoint to fail which is expected and should not be > reported to the {{JobMaster}}. > Hence, the checkpoint notifications should be coupled with the > {{StreamTask's}} lifecycle (e.g. notifications must be sent from the > {{StreamTask's}} main thread) and if not valid, then they need to be filtered > out/suppressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-3588) Add a streaming (exactly-once) JDBC connector
[ https://issues.apache.org/jira/browse/FLINK-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-3588: -- Priority: Minor (was: Major) > Add a streaming (exactly-once) JDBC connector > - > > Key: FLINK-3588 > URL: https://issues.apache.org/jira/browse/FLINK-3588 > Project: Flink > Issue Type: New Feature > Components: Connectors / Common >Affects Versions: 1.0.0 >Reporter: Chesnay Schepler >Priority: Minor > Labels: auto-deprioritized-major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-4621) Improve decimal literals of SQL API
[ https://issues.apache.org/jira/browse/FLINK-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-4621: -- Priority: Minor (was: Major) > Improve decimal literals of SQL API > --- > > Key: FLINK-4621 > URL: https://issues.apache.org/jira/browse/FLINK-4621 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Reporter: Timo Walther >Priority: Minor > Labels: auto-deprioritized-major > > Currently, all SQL {{DECIMAL}} types are converted to BigDecimals internally. > By default, the SQL parsers creates {{DECIMAL}} literals of any number e.g. > {{SELECT 1.0, 12, -0.5 FROM x}}. I think it would be better if these simple > numbers would be represented as Java primitives instead of objects. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9802) Harden End-to-end tests against download failures
[ https://issues.apache.org/jira/browse/FLINK-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-9802: -- Labels: auto-deprioritized-major (was: stale-major) > Harden End-to-end tests against download failures > - > > Key: FLINK-9802 > URL: https://issues.apache.org/jira/browse/FLINK-9802 > Project: Flink > Issue Type: Improvement > Components: Tests >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Priority: Major > Labels: auto-deprioritized-major > > Several end-to-end tests download libraries (kafka, zookeeper, elasticsearch) > to set them up locally for testing purposes. Currently, (at least for the > elasticsearch test), we do not guard against failed downloads. > We should do a sweep over all tests and harden them against download > failures, by retrying the download on failure, and explicitly exiting if the > download did not succeed after N attempts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-7624) Add kafka-topic for "KafkaProducer" metrics
[ https://issues.apache.org/jira/browse/FLINK-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336707#comment-17336707 ] Flink Jira Bot commented on FLINK-7624: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Add kafka-topic for "KafkaProducer" metrics > --- > > Key: FLINK-7624 > URL: https://issues.apache.org/jira/browse/FLINK-7624 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Runtime / Metrics >Reporter: Hai Zhou >Priority: Major > Labels: stale-major > > Currently, metric in "KafkaProducer" MetricGroup, Such as: > {code:java} > localhost.taskmanager.dc4092a96ea4e54ecdbd13b9a5c209b2.Flink Streaming > Job.Sink--MTKafkaProducer08.0.KafkaProducer.record-queue-time-avg > {code} > The metric name in the "KafkaProducer" group does not have a kafka-topic name > part, if the job writes data to two different kafka sinks, these metrics > will not distinguish. > I wish that modify the above metric name as follows: > {code:java} > localhost.taskmanager.dc4092a96ea4e54ecdbd13b9a5c209b2.Flink Streaming > Job.Sink--MTKafkaProducer08.0.KafkaProducer. topic>.record-queue-time-avg > {code} > Best, > Hai Zhou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-6626) Unifying lifecycle management of SubmittedJobGraph- and CompletedCheckpointStore
[ https://issues.apache.org/jira/browse/FLINK-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-6626: -- Labels: auto-deprioritized-major (was: stale-major) > Unifying lifecycle management of SubmittedJobGraph- and > CompletedCheckpointStore > > > Key: FLINK-6626 > URL: https://issues.apache.org/jira/browse/FLINK-6626 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Priority: Major > Labels: auto-deprioritized-major > > Currently, Flink uses the {{SubmittedJobGraphStore}} to persist {{JobGraphs}} > such that they can be recovered in case of failures. The > {{SubmittedJobGraphStore}} is managed by by the {{JobManager}}. Additionally, > Flink has the {{CompletedCheckpointStore}} which stores checkpoints for a > given {{ExecutionGraph}}/job. The {{CompletedCheckpointStore}} is managed by > the {{CheckpointCoordinator}}. > The {{SubmittedJobGraphStore}} and the {{CompletedCheckpointStore}} are > somewhat related because in the latter we store checkpoints for jobs > contained in the former. I think it would be nice wrt lifecycle management to > let the {{SubmittedJobGraphStore}} manage the lifecycle of the > {{CompletedCheckpointStore}}, because often it does not make much sense to > keep only checkpoints without a job or a job without checkpoints. > An idea would be when we register a job with the {{SubmittedJobGraphStore}} > then it returns a {{CompletedCheckpointStore}}. This store can then be given > to the {{CheckpointCoordinator}} to store the checkpoints. When a job enters > a terminal state it could be the responsibility of the > {{SubmittedJobGraphStore}} to decide what to do with the job data > ({{JobGraph}} and {{Checkpoints}}), e.g. keeping it or cleaning it up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-2899) The groupReduceOn* methods which take types as a parameter fail with TypeErasure
[ https://issues.apache.org/jira/browse/FLINK-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-2899: -- Labels: auto-deprioritized-major (was: stale-major) > The groupReduceOn* methods which take types as a parameter fail with > TypeErasure > > > Key: FLINK-2899 > URL: https://issues.apache.org/jira/browse/FLINK-2899 > Project: Flink > Issue Type: Bug > Components: Library / Graph Processing (Gelly) >Affects Versions: 0.10.0 >Reporter: Andra Lungu >Priority: Major > Labels: auto-deprioritized-major > > I tried calling groupReduceOnEdges (EdgesFunctionWithVertexValue T> edgesFunction, EdgeDirection direction, TypeInformation typeInfo) in > order to make the vertex-centric version of the Triangle Count library method > applicable to any kind of key and I got a TypeErasure Exception. > After doing a bit of debugging (see the hack in > https://github.com/andralungu/flink/tree/trianglecount-vertexcentric), I saw > that actually the call to > TypeExtractor.createTypeInfo(NeighborsFunctionWithVertexValue.class, in > ApplyNeighborCoGroupFunction does not work properly, i.e. it returns null. > From what I see, the coGroup in groupReduceOnNeighbors tries to infer a type > before "returns" is called. > I may be missing something, but that particular feature (groupReduceOn with > types) is not documented or tested so we would also need some tests for that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-12304) AvroInputFormat should support schema evolution
[ https://issues.apache.org/jira/browse/FLINK-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-12304: --- Labels: auto-deprioritized-major (was: stale-major) > AvroInputFormat should support schema evolution > --- > > Key: FLINK-12304 > URL: https://issues.apache.org/jira/browse/FLINK-12304 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.8.0 >Reporter: John >Priority: Major > Labels: auto-deprioritized-major > > From the avro spec: > _A reader of Avro data, whether from an RPC or a file, can always parse that > data because its schema is provided. But that schema may not be exactly the > schema that was expected. For example, if the data was written with a > different version of the software than it is read, then records may have had > fields added or removed._ > The AvroInputFormat should allow the application to supply a reader's schema > to support cases where data was written with an old version of a schema and > needs to be read with a newer version. The reader's schema can have addition > fields with defaults so that the old schema can be adapted to the new. The > underlying avro java library supports schema resolution, so adding support in > AvroInputFormat should be straight forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5260) Add docs about how to react to cancellation
[ https://issues.apache.org/jira/browse/FLINK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-5260: -- Priority: Minor (was: Major) > Add docs about how to react to cancellation > --- > > Key: FLINK-5260 > URL: https://issues.apache.org/jira/browse/FLINK-5260 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Ufuk Celebi >Priority: Minor > Labels: auto-deprioritized-major > > Task cancellation with operators that work against external services can be a > source of confusion for users. Since users need to cooperate for this, I > would like to see some docs added about how the user can do this (which > callbacks are available, how to react to interrupts, etc.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-8297) RocksDBListState stores whole list in single byte[]
[ https://issues.apache.org/jira/browse/FLINK-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-8297: -- Priority: Minor (was: Major) > RocksDBListState stores whole list in single byte[] > --- > > Key: FLINK-8297 > URL: https://issues.apache.org/jira/browse/FLINK-8297 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.3.2, 1.4.0 >Reporter: Jan Lukavský >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > RocksDBListState currently keeps whole list of data in single RocksDB > key-value pair, which implies that the list actually must fit into memory. > Larger lists are not supported and end up with OOME or other error. The > RocksDBListState could be modified so that individual items in list are > stored in separate keys in RocksDB and can then be iterated over. A simple > implementation could reuse existing RocksDBMapState, with key as index to the > list and a single RocksDBValueState keeping track of how many items has > already been added to the list. Because this implementation might be less > efficient in come cases, it would be good to make it opt-in by a construct > like > {{new RocksDBStateBackend().enableLargeListsPerKey()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-8236) Allow to set the parallelism of table queries
[ https://issues.apache.org/jira/browse/FLINK-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-8236: -- Priority: Minor (was: Major) > Allow to set the parallelism of table queries > - > > Key: FLINK-8236 > URL: https://issues.apache.org/jira/browse/FLINK-8236 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.4.0 >Reporter: Timo Walther >Priority: Minor > Labels: auto-deprioritized-major > > Right now the parallelism of a table program is determined by the parallelism > of the stream/batch environment. E.g., by default, tumbling window operators > use the default parallelism of the environment. Simple project and select > operations have the same parallelism as the inputs they are applied on. > While we cannot change forwarding operations because this would change the > results when using retractions, it should be possible to change the > parallelism for operators after shuffling operations. > It should be possible to specify the default parallelism of a table program > in the {{TableConfig}} and/or {{QueryConfig}}. The configuration per query > has higher precedence that the configuration per table environment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-2220) Join on Pojo without hashCode() silently fails
[ https://issues.apache.org/jira/browse/FLINK-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-2220: -- Priority: Minor (was: Major) > Join on Pojo without hashCode() silently fails > -- > > Key: FLINK-2220 > URL: https://issues.apache.org/jira/browse/FLINK-2220 > Project: Flink > Issue Type: Bug > Components: API / DataSet >Affects Versions: 0.9, 0.8.1 >Reporter: Marcus Leich >Priority: Minor > Labels: auto-deprioritized-major > > I need to perform a join using a complete Pojo as join key. > With DOP > 1 this only works if the Pojo comes with a meaningful hasCode() > implementation, as otherwise equal objects will get hashed to different > partitions based on their memory address and not on the content. > I guess it's fine if users are required to implement hasCode() themselves, > but it would be nice of documentation or better yet, Flink itself could alert > users that this is a requirement, similar to how Comparable is required for > keys. > Use the following code to reproduce the issue: > public class Pojo implements Comparable { > public byte[] data; > public Pojo () { > } > public Pojo (byte[] data) { > this.data = data; > } > @Override > public int compareTo(Pojo o) { > return UnsignedBytes.lexicographicalComparator().compare(data, > o.data); > } > // uncomment me for making the join work > /* @Override > public int hashCode() { > return Arrays.hashCode(data); > }*/ > } > public void testJoin () throws Exception { > final ExecutionEnvironment env = > ExecutionEnvironment.createLocalEnvironment(); > env.setParallelism(4); > DataSet> left = env.fromElements( > new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "black"), > new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), "red"), > new Tuple2<>(new Pojo(new byte[] {1}), "Spark"), > new Tuple2<>(new Pojo(new byte[] {2}), "good"), > new Tuple2<>(new Pojo(new byte[] {5}), "bug")); > DataSet> right = env.fromElements( > new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "white"), > new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), > "green"), > new Tuple2<>(new Pojo(new byte[] {1}), "Flink"), > new Tuple2<>(new Pojo(new byte[] {2}), "evil"), > new Tuple2<>(new Pojo(new byte[] {5}), "fix")); > // will not print anything unless Pojo has a real hashCode() > implementation > > left.join(right).where(0).equalTo(0).projectFirst(1).projectSecond(1).print(); > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-19156) Migration of transactionIdHint in Kafka is never applied
[ https://issues.apache.org/jira/browse/FLINK-19156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336951#comment-17336951 ] Flink Jira Bot commented on FLINK-19156: This issue was labeled "stale-critical" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Critical, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Migration of transactionIdHint in Kafka is never applied > > > Key: FLINK-19156 > URL: https://issues.apache.org/jira/browse/FLINK-19156 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.9.0 >Reporter: Dawid Wysakowicz >Priority: Critical > Labels: stale-critical > > The code that checks if we should migrate the transaction id is as follows: > {code} > @Deprecated > private static final > ListStateDescriptor > NEXT_TRANSACTIONAL_ID_HINT_DESCRIPTOR = > new ListStateDescriptor<>("next-transactional-id-hint", > TypeInformation.of(NextTransactionalIdHint.class)); > if > (context.getOperatorStateStore().getRegisteredStateNames().contains(NEXT_TRANSACTIONAL_ID_HINT_DESCRIPTOR)) > { > migrateNextTransactionalIdHindState(context); > } > {code} > The condition in if statement is never met because it checks if a > {{Set}} contains object of type {{ListStateDescriptor}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-3627) Task stuck on lock in StreamSource when cancelling
[ https://issues.apache.org/jira/browse/FLINK-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-3627: -- Priority: Minor (was: Major) > Task stuck on lock in StreamSource when cancelling > -- > > Key: FLINK-3627 > URL: https://issues.apache.org/jira/browse/FLINK-3627 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Jamie Grier >Priority: Minor > Labels: auto-deprioritized-major, hang > > I've seen this occur a couple of times when the # of network buffers is set > too low. The job fails with the an appropriate message indicating that the > user should increase the # of network buffers. However, some of the task > threads then hang with a stack trace similar to the following. > 2016-03-16 13:38:54,017 WARN org.apache.flink.runtime.taskmanager.Task > - Task 'Source: EventGenerator -> (Flat Map, blah -> Filter -> > Projection -> Flat Map -> Timestamps/Watermarks -> Map) (46/144)' did not > react to cancelling signal, but is stuck in method: > > org.apache.flink.streaming.api.operators.StreamSource$ManualWatermarkContext.collect(StreamSource.java:317) > flink.benchmark.generator.LoadGeneratorSource.run(LoadGeneratorSource.java:38) > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:78) > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:56) > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:224) > org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) > java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11783: --- Labels: auto-deprioritized-major (was: stale-major) > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug > Components: API / DataSet >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Major > Labels: auto-deprioritized-major > Attachments: flink_is_stuck.png > > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (68/150)" #153 prio=5 os_prio=0 > tid=0x7faa5c019800 nid=0x248a
[jira] [Commented] (FLINK-14521) CoLocationGroup is not set into JobVertex if the stream node is chained with others
[ https://issues.apache.org/jira/browse/FLINK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336395#comment-17336395 ] Flink Jira Bot commented on FLINK-14521: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > CoLocationGroup is not set into JobVertex if the stream node is chained with > others > --- > > Key: FLINK-14521 > URL: https://issues.apache.org/jira/browse/FLINK-14521 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.9.0, 1.9.1 >Reporter: Yun Gao >Priority: Major > Labels: pull-request-available, stale-major > Time Spent: 10m > Remaining Estimate: 0h > > StreamingJobGraphGenerator.isChainable dose not consider the coLocationGroup, > if A -> B is chained, the coLocationGroup of the corresponding JobVertex will > be set with that of the head node, namely A. Therefore, if B has declared > coLocationGroup but A does not, then the coLocationGroup of B will be ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-8416) Kinesis consumer doc examples should demonstrate preferred default credentials provider
[ https://issues.apache.org/jira/browse/FLINK-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-8416: -- Labels: auto-deprioritized-major (was: stale-major) > Kinesis consumer doc examples should demonstrate preferred default > credentials provider > --- > > Key: FLINK-8416 > URL: https://issues.apache.org/jira/browse/FLINK-8416 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kinesis, Documentation >Reporter: Tzu-Li (Gordon) Tai >Priority: Major > Labels: auto-deprioritized-major > > The Kinesis consumer docs > [here](https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kinesis.html#kinesis-consumer) > demonstrate providing credentials by explicitly supplying the AWS Access ID > and Key. > The always preferred approach for AWS, unless running locally, is to > automatically fetch the shipped credentials from the AWS environment. > That is actually the default behaviour of the Kinesis consumer, so the docs > should demonstrate that more clearly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-2824) Iteration feedback partitioning does not work as expected
[ https://issues.apache.org/jira/browse/FLINK-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336905#comment-17336905 ] Flink Jira Bot commented on FLINK-2824: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Iteration feedback partitioning does not work as expected > - > > Key: FLINK-2824 > URL: https://issues.apache.org/jira/browse/FLINK-2824 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 0.10.0 >Reporter: Gyula Fora >Priority: Major > Labels: stale-major > > Iteration feedback partitioning is not handled transparently and can cause > serious issues if the user does not know the specific implementation details > of streaming iterations (which is not a realistic expectation). > Example: > IterativeStream it = ... (parallelism 1) > DataStream mapped = it.map(...) (parallelism 2) > // this does not work as the feedback has parallelism 2 != 1 > // it.closeWith(mapped.partitionByHash(someField)) > // so we need rebalance the data > it.closeWith(mapped.map(NoOpMap).setParallelism(1).partitionByHash(someField)) > This program will execute but the feedback will not be partitioned by hash to > the mapper instances: > The partitioning will be set from the noOpMap to the iteration sink which has > parallelism different from the mapper (1 vs 2) and then the iteration source > forwards the element to the mapper (always to 0). > So the problem is basically that the iteration source/sink pair gets the > parallelism of the input stream (p=1) not the head operator (p = 2) which > leads to incorrect partitioning. > Workaround: > Set input parallelism to the same as the head operator > Suggested solution: > The iteration construction should be reworked to set the parallelism of the > source/sink to the parallelism of the head operator (and validate that all > heads have the same parallelism) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-6199) Single outstanding Async I/O operation per key
[ https://issues.apache.org/jira/browse/FLINK-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-6199: -- Labels: auto-deprioritized-major (was: stale-major) > Single outstanding Async I/O operation per key > -- > > Key: FLINK-6199 > URL: https://issues.apache.org/jira/browse/FLINK-6199 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Jamie Grier >Priority: Major > Labels: auto-deprioritized-major > > I would like to propose we extend the Async I/O semantics a bit such that a > user can guarantee a single outstanding async request per key. > This would allow a user to order async requests per key while still achieving > the throughput benefits of using Async I/O in the first place. > This is essential for operations where stream order is important but we still > need to use Async operations to interact with an external system in a > performant way. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-7640) Dashboard should display information about JobManager cluster in HA mode
[ https://issues.apache.org/jira/browse/FLINK-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336706#comment-17336706 ] Flink Jira Bot commented on FLINK-7640: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Dashboard should display information about JobManager cluster in HA mode > > > Key: FLINK-7640 > URL: https://issues.apache.org/jira/browse/FLINK-7640 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.3.2 >Reporter: Elias Levy >Priority: Major > Labels: stale-major > > Currently the dashboard provides no information about the status of a cluster > of JobManagers configured in high-availability mode. > The dashboard should display the status and membership of a JM cluster in the > Overview and Job Manager sections. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9052) Flink complains when scala.Option is used inside POJO
[ https://issues.apache.org/jira/browse/FLINK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-9052: -- Labels: auto-deprioritized-major (was: stale-major) > Flink complains when scala.Option is used inside POJO > - > > Key: FLINK-9052 > URL: https://issues.apache.org/jira/browse/FLINK-9052 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Affects Versions: 1.4.2 >Reporter: Truong Duc Kien >Priority: Major > Labels: auto-deprioritized-major > Attachments: TypeInfomationTest.scala > > > According to the documentation, Flink has a specialized serializer for Option > type. However, when an Option field is used inside a POJO, the following > WARNING appears in TaskManagers' log. > > {{No fields detected for class scala.Option. Cannot be used as a PojoType. > Will be handled as GenericType}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-7129) Support dynamically changing CEP patterns
[ https://issues.apache.org/jira/browse/FLINK-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336730#comment-17336730 ] Flink Jira Bot commented on FLINK-7129: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Support dynamically changing CEP patterns > - > > Key: FLINK-7129 > URL: https://issues.apache.org/jira/browse/FLINK-7129 > Project: Flink > Issue Type: New Feature > Components: Library / CEP >Reporter: Dawid Wysakowicz >Priority: Major > Labels: stale-major > > An umbrella task for introducing mechanism for injecting patterns through > coStream -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-8844) Export job jar file name or job version property via REST API
[ https://issues.apache.org/jira/browse/FLINK-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-8844: -- Priority: Minor (was: Major) > Export job jar file name or job version property via REST API > - > > Key: FLINK-8844 > URL: https://issues.apache.org/jira/browse/FLINK-8844 > Project: Flink > Issue Type: Improvement > Components: Runtime / REST >Affects Versions: 1.4.3 >Reporter: Elias Levy >Priority: Minor > Labels: auto-deprioritized-major > > To aid automated deployment of jobs, it would be useful if the REST API > exposed either a running job's jar filename or a version property the job > could set, similar to how it sets the job name. > As it is now there is no standard mechanism to determine what version of a > job is running in a cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-11030) Cannot use Avro logical types with ConfluentRegistryAvroDeserializationSchema
[ https://issues.apache.org/jira/browse/FLINK-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336546#comment-17336546 ] Flink Jira Bot commented on FLINK-11030: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Cannot use Avro logical types with ConfluentRegistryAvroDeserializationSchema > - > > Key: FLINK-11030 > URL: https://issues.apache.org/jira/browse/FLINK-11030 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.6.2 >Reporter: Maciej Bryński >Priority: Major > Labels: pull-request-available, stale-major > Time Spent: 20m > Remaining Estimate: 0h > > I created Specific class for Kafka topic. > Avro schema includes logicalTypes. > Then I want to read data using following code: > {code:scala} > val deserializationSchema = > ConfluentRegistryAvroDeserializationSchema.forSpecific(classOf[mySpecificClass], > schemaRegistryUrl) > val kafkaStream = env.addSource( > new FlinkKafkaConsumer011(topic, deserializationSchema, kafkaProperties) > ) > kafkaStream.print() > {code} > Result: > {code} > Exception in thread "main" > org.apache.flink.runtime.client.JobExecutionException: > java.lang.ClassCastException: java.lang.Long cannot be cast to > org.joda.time.DateTime > at > org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:623) > at > org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1511) > at > org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:645) > at TransactionEnrichment$.main(TransactionEnrichment.scala:50) > at TransactionEnrichment.main(TransactionEnrichment.scala) > Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to > org.joda.time.DateTime > at platform_tbl_game_transactions_v1.Value.put(Value.java:222) > at org.apache.avro.generic.GenericData.setField(GenericData.java:690) > at > org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:119) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) > at > org.apache.flink.formats.avro.RegistryAvroDeserializationSchema.deserialize(RegistryAvroDeserializationSchema.java:74) > at > org.apache.flink.streaming.util.serialization.KeyedDeserializationSchemaWrapper.deserialize(KeyedDeserializationSchemaWrapper.java:44) > at > org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:142) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:738) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > {code} > When using Kafka Consumer there was a hack for this to use LogicalConverters. > Unfortunately it's not working in flink. > {code} > SpecificData.get.addLogicalTypeConversion(new > TimeConversions.TimestampConversion) > {code} > Problem probably is cause by the fact we're creating own instance of > SpecificData > https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/AvroDeserializationSchema.java#L145 > And there is no logical conversions added. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-6473) Add OVER window support for batch tables
[ https://issues.apache.org/jira/browse/FLINK-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-6473: -- Labels: auto-deprioritized-major (was: stale-major) > Add OVER window support for batch tables > > > Key: FLINK-6473 > URL: https://issues.apache.org/jira/browse/FLINK-6473 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Fabian Hueske >Priority: Major > Labels: auto-deprioritized-major > > Add support for OVER windows for batch tables. > Since OVER windows are supported for streaming tables, this issue is not > about the API (which is available) but about adding the execution strategies > and translation for OVER windows on batch tables. > The feature could be implemented using the following plans > *UNBOUNDED OVER* > {code} > DataSet[Row] input = ... > DataSet[Row] result = input > .groupBy(partitionKeys) > .sortGroup(orderByKeys) > .reduceGroup(computeAggregates) > {code} > This implementation is quite straightforward because we don't need to retract > rows. > *BOUNDED OVER* > A bit more challenging are BOUNDED OVER windows, because we need to retract > values from aggregates and we don't want to store rows temporarily on the > heap. > {code} > DataSet[Row] input = ... > DataSet[Row] sorted = input > .partitionByHash(partitionKey) > .sortPartition(partitionKeys, orderByKeys) > DataSet[Row] result = sorted.coGroup(sorted) > .where(partitionKey).equalTo(partitionKey) > .with(computeAggregates) > {code} > With this, the data set should be partitioned and sorted once. The sorted > {{DataSet}} would be consumed twice (the optimizer should inject a temp > barrier on one of the inputs to avoid a consumption deadlock). The > {{CoGroupFunction}} would accumulate new rows into the aggregates from one > input and retract them from the other. Since both input streams are properly > sorted, this can happen in a zigzag fashion. We need verify that the > generated plan is was we want it to be. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12304) AvroInputFormat should support schema evolution
[ https://issues.apache.org/jira/browse/FLINK-12304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336488#comment-17336488 ] Flink Jira Bot commented on FLINK-12304: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > AvroInputFormat should support schema evolution > --- > > Key: FLINK-12304 > URL: https://issues.apache.org/jira/browse/FLINK-12304 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.8.0 >Reporter: John >Priority: Major > Labels: stale-major > > From the avro spec: > _A reader of Avro data, whether from an RPC or a file, can always parse that > data because its schema is provided. But that schema may not be exactly the > schema that was expected. For example, if the data was written with a > different version of the software than it is read, then records may have had > fields added or removed._ > The AvroInputFormat should allow the application to supply a reader's schema > to support cases where data was written with an old version of a schema and > needs to be read with a newer version. The reader's schema can have addition > fields with defaults so that the old schema can be adapted to the new. The > underlying avro java library supports schema resolution, so adding support in > AvroInputFormat should be straight forward. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-7865) Remove predicate restrictions on TableFunction left outer join
[ https://issues.apache.org/jira/browse/FLINK-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336698#comment-17336698 ] Flink Jira Bot commented on FLINK-7865: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Remove predicate restrictions on TableFunction left outer join > -- > > Key: FLINK-7865 > URL: https://issues.apache.org/jira/browse/FLINK-7865 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Xingcan Cui >Priority: Major > Labels: stale-major > > To cover up the improper translation of lateral table left outer join > (CALCITE-2004), we have temporarily forbidden the predicates (except {{true}} > literal) in Table API (FLINK-7853) and SQL (FLINK-7854). Once the issue has > been fixed in Calcite, we should remove the restrictions. The tasks may > include removing Table API/SQL condition check, removing validation tests, > enabling integration tests, updating the documents, etc. > See [this thread on Calcite dev > list|https://lists.apache.org/thread.html/16caeb8b1649c4da85f9915ea723c6c5b3ced0b96914cadc24ee4e15@%3Cdev.calcite.apache.org%3E] > for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-9565) Evaluating scalar UDFs in parallel
[ https://issues.apache.org/jira/browse/FLINK-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336623#comment-17336623 ] Flink Jira Bot commented on FLINK-9565: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Evaluating scalar UDFs in parallel > -- > > Key: FLINK-9565 > URL: https://issues.apache.org/jira/browse/FLINK-9565 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Affects Versions: 1.4.2 >Reporter: yinhua.dai >Priority: Major > Labels: stale-major > > As per > [https://stackoverflow.com/questions/50790023/does-flink-sql-support-to-run-projections-in-parallel,] > scalar UDF in the same SQL is always evaluated sequentially even when those > UDF are irrelevant, it may increase latency when the UDF is time consuming > function. > It would be great if Flink SQL can support to run those UDF in parallel to > reduce calculation latency. > > cc [~fhueske] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12298) Make column functions accept custom Range class rather than Expression
[ https://issues.apache.org/jira/browse/FLINK-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336491#comment-17336491 ] Flink Jira Bot commented on FLINK-12298: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Make column functions accept custom Range class rather than Expression > -- > > Key: FLINK-12298 > URL: https://issues.apache.org/jira/browse/FLINK-12298 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Reporter: Dawid Wysakowicz >Priority: Major > Labels: stale-major > > I would suggest to rework the column functions to a more typesafe approach > with custom {{Range}} class. Right now {{withColumns}} accepts array of > Expressions. We have > {{org.apache.flink.table.api.scala.ImplicitExpressionOperations#to}} method, > but also we have implicit conversion from {{scala.Range}} to Expression > range. This already introduces ambiguity, as it is unclear what will be the > end product of expression like {{1 to 9}}. This approach defers the checking > of the types of expressions to expressions resolution phase. > I would suggest to make > {{org.apache.flink.table.api.scala.withColumns#apply}} always accept e.g. > {{ColumnRange}} that could always accept only {{Integer}} or {{String}} > instead of Expressions. Such class could look like (this is just a very rough > outline): > {code} > class ColumnRange { > IndexRange idx(Integer idx); > ReferenceRange ref(String reference); > } > class IndexRange { > IndexRange to(Integer idx); > } > class ReferenceRange { > ReferenceRange to(String ref); > } > {code} > We could also have implicit conversion from {{scala.Range}} to {{IndexRange}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-8663) Execution of DataStreams result in non functionality of size of Window for countWindow
[ https://issues.apache.org/jira/browse/FLINK-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-8663: -- Labels: auto-deprioritized-major (was: stale-major) > Execution of DataStreams result in non functionality of size of Window for > countWindow > -- > > Key: FLINK-8663 > URL: https://issues.apache.org/jira/browse/FLINK-8663 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.4.0 > Environment: package com.vnl.stocks; > import java.util.concurrent.TimeUnit; > import org.apache.flink.api.common.functions.MapFunction; > import org.apache.flink.streaming.api.datastream.AllWindowedStream; > import org.apache.flink.streaming.api.datastream.DataStream; > import org.apache.flink.streaming.api.datastream.WindowedStream; > import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; > import > org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows; > import org.apache.flink.streaming.api.windowing.time.Time; > import org.apache.flink.streaming.api.windowing.windows.GlobalWindow; > import org.apache.flink.streaming.api.windowing.windows.TimeWindow; > public class StocksProcessing { > > public static void main(String[] args) throws Exception { > final StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > > //Read from a socket stream at map it to StockPrice objects > DataStream socketStockStream = env > .socketTextStream("localhost", ) > .map(new MapFunction() { > private String[] tokens; > > @Override > public StockPrice map(String value) throws > Exception { > tokens = value.split(","); > return new StockPrice(tokens[0], > Double.parseDouble(tokens[1])); > } > }); > > socketStockStream.print(); > //Generate other stock streams > DataStream SPX_stream = env.addSource(new > StockSource("SPX", 10)); > // DataStream FTSE_stream = env.addSource(new > StockSource("FTSE", 20)); > // DataStream DJI_stream = env.addSource(new > StockSource("DJI", 30)); > // DataStream BUX_stream = env.addSource(new > StockSource("BUX", 40)); > > //Merge all stock streams together > > DataStream stockStream = > socketStockStream.union(SPX_stream/*, FTSE_stream, DJI_stream, BUX_stream*/); > > > // stockStream.print(); > Thread.sleep(100); > > AllWindowedStream windowedStream = > stockStream > .countWindowAll(10, 5); > > //.keyBy("symbol") > //.timeWindowAll(Time.of(10, TimeUnit.SECONDS), > Time.of(1, TimeUnit.SECONDS)); > > //stockStream.keyBy("symbol"); > //Compute some simple statistics on a rolling window > DataStream lowest = > windowedStream.maxBy("price"); > //DataStream highest = windowedStream.; > /*DataStream maxByStock = > windowedStream.groupBy("symbol") > .maxBy("price").flatten(); > DataStream rollingMean = > windowedStream.groupBy("symbol") > .mapWindow(new WindowMean()).flatten();*/ > lowest.print(); > > Thread.sleep(100); > /* > AllWindowedStream > windowedStream1 = lowest > .countWindowAll(5,2); > //windowedStream1.print(); > DataStream highest = > windowedStream1.minBy("price");*/ > //highest.print(); > > env.execute("Stock stream"); > } > } >Reporter: Subham >Priority: Major > Labels: auto-deprioritized-major > > I used AllWindowedStream to process a stream and generate > maximum of my window using countWindowAll functions. In this output the size > and slide of window works incorrectly. > Refer below example for the bug > Initial stream : 1,2,3,4,5,6. > Output 1: (Find min for window 10,5) : 1,6,11.(This is correct) >
[jira] [Updated] (FLINK-3983) Allow users to set any (relevant) configuration parameter of the KinesisProducerConfiguration
[ https://issues.apache.org/jira/browse/FLINK-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-3983: -- Labels: auto-deprioritized-major (was: stale-major) > Allow users to set any (relevant) configuration parameter of the > KinesisProducerConfiguration > - > > Key: FLINK-3983 > URL: https://issues.apache.org/jira/browse/FLINK-3983 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common, Connectors / Kinesis >Affects Versions: 1.1.0 >Reporter: Robert Metzger >Priority: Major > Labels: auto-deprioritized-major > > Currently, users can only set some of the configuration parameters in the > {{KinesisProducerConfiguration}} through Properties. > It would be good to introduce configuration keys for these keys so that users > can change the producer configuration. > I think these and most of the other variables in the > KinesisProducerConfiguration should be exposed via properties: > - aggregationEnabled > - collectionMaxCount > - collectionMaxSize > - connectTimeout > - credentialsRefreshDelay > - failIfThrottled > - logLevel > - metricsGranularity > - metricsLevel > - metricsNamespace > - metricsUploadDelay -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-10276) Job Manager and Task Manager Metrics Reporter Ports Configuration
[ https://issues.apache.org/jira/browse/FLINK-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336585#comment-17336585 ] Flink Jira Bot commented on FLINK-10276: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Job Manager and Task Manager Metrics Reporter Ports Configuration > - > > Key: FLINK-10276 > URL: https://issues.apache.org/jira/browse/FLINK-10276 > Project: Flink > Issue Type: New Feature > Components: Runtime / Configuration, Runtime / Metrics >Reporter: Deirdre Kong >Priority: Major > Labels: stale-major, starter > > *Problem Statement:* > When deploying Flink using YARN, the job manager and task manager can be on > the same node or different nodes. Say I specify the port range to be > 9249-9250, if JM and TM are deployed on the same node, the port for JM will > be 9249 and the port for TM will be 9250. If JM and TM are deployed on > different nodes, then the ports for JM and TM will be 9249. > I can only configure Prometheus once for the ports to scrape JM and TMs > metrics. In this case, I won't know whether port 9249 is for JM or TM. If > would be great if we can specify in flink-conf.yaml on the port we want for > JM reporter and TMs reporter. > *Comment from Till:* > I think we could extend Vino's proposal for Yarn as well: Maybe it makes > sense to allow to override certain configuration settings for the > TaskManagers when deploying on Yarn. That way one could define a fixed port > for the JM and a port range for the TMs. Having such a distinction you can > configure your Prometheus to scrape for the single JM and the TMs > individually. However, Flink does not yet support such a feature. You can > open a JIRA issue to track the problem. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9907) add CRC32 checksum in table Api and sql
[ https://issues.apache.org/jira/browse/FLINK-9907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-9907: -- Priority: Minor (was: Major) > add CRC32 checksum in table Api and sql > --- > > Key: FLINK-9907 > URL: https://issues.apache.org/jira/browse/FLINK-9907 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: xueyu >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > > CRC32 returns the cyclic redundancy check value of a given string as a 32-bit > unsigned value, null if string is null. In mysql and hive the syntax like > select CRC32('test') = 3632233996 > Though it is not a hash function, however the pattern and behavior looks > similar with hash functions (md5, sha1, sha2 etc..), so I put codegen in > HashCalcCallGen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12303) Scala 2.12 lambdas does not work in event classes inside streams.
[ https://issues.apache.org/jira/browse/FLINK-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336489#comment-17336489 ] Flink Jira Bot commented on FLINK-12303: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Scala 2.12 lambdas does not work in event classes inside streams. > - > > Key: FLINK-12303 > URL: https://issues.apache.org/jira/browse/FLINK-12303 > Project: Flink > Issue Type: Bug > Components: API / DataStream, API / Scala >Affects Versions: 1.7.2 > Environment: Scala 2.11/2.12, Oracle Java 1.8.0_172 >Reporter: Matěj Novotný >Priority: Major > Labels: stale-major > > When you use lambdas inside event classes used in streams it does work in > Scala 2.11. It stoped working in Scala 2.12. It does compile but does not > process any data and does not throw any exception. I would expect that it > would not compile in case I have used some not supported field in event class > or I would throw some exception at least. > > For more detail check my demonstration repo, please: > [https://github.com/matej-novotny/flink-lambda-bug] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-8179) Convert CharSequence to String when registering / converting a Table
[ https://issues.apache.org/jira/browse/FLINK-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-8179: -- Priority: Minor (was: Major) > Convert CharSequence to String when registering / converting a Table > > > Key: FLINK-8179 > URL: https://issues.apache.org/jira/browse/FLINK-8179 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.5.0 >Reporter: Fabian Hueske >Priority: Minor > Labels: auto-deprioritized-major > > Avro objects store text values as `java.lang.CharSequence`. Right now, these > fields are treated as `ANY` type by Calcite (`GenericType` by Flink). So, > importing a `DataStream` or `DataSet` of Avro objects results in Table where > the text values cannot be used as String fields. > We should convert `CharSequence` fields to `String` when importing / > converting to a Table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-11196) Extend S3 EntropyInjector to use key replacement (instead of key removal) when creating checkpoint metadata files
[ https://issues.apache.org/jira/browse/FLINK-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336537#comment-17336537 ] Flink Jira Bot commented on FLINK-11196: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Extend S3 EntropyInjector to use key replacement (instead of key removal) > when creating checkpoint metadata files > - > > Key: FLINK-11196 > URL: https://issues.apache.org/jira/browse/FLINK-11196 > Project: Flink > Issue Type: Improvement > Components: FileSystems >Affects Versions: 1.7.0 >Reporter: Mark Cho >Priority: Major > Labels: pull-request-available, stale-major > Time Spent: 10m > Remaining Estimate: 0h > > We currently use S3 entropy injection when writing out checkpoint data. > We also use external checkpoints so that we can resume from a checkpoint > metadata file later. > The current implementation of S3 entropy injector makes it difficult to > locate the checkpoint metadata files since in the newer versions of Flink, > `state.checkpoints.dir` configuration controls where the metadata and state > files are written, instead of having two separate paths (one for metadata, > one for state files). > With entropy injection, we replace the entropy marker in the path specified > by `state.checkpoints.dir` with entropy (for state files) or we strip out the > marker (for metadata files). > > We need to extend the entropy injection so that we can replace the entropy > marker with a predictable path (instead of removing it) so that we can do a > prefix query for just the metadata files. > By not using the entropy key replacement (defaults to empty string), you get > the same behavior as it is today (entropy marker removed). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-11783) Deadlock during Join operation
[ https://issues.apache.org/jira/browse/FLINK-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336519#comment-17336519 ] Flink Jira Bot commented on FLINK-11783: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Deadlock during Join operation > -- > > Key: FLINK-11783 > URL: https://issues.apache.org/jira/browse/FLINK-11783 > Project: Flink > Issue Type: Bug > Components: API / DataSet >Affects Versions: 1.7.2 >Reporter: Julien Nioche >Priority: Major > Labels: stale-major > Attachments: flink_is_stuck.png > > > I am running a filtering job on a large dataset with Flink running in > distributed mode. Most tasks in the Join operation have completed a while ago > and only the tasks from a particular TaskManager are still running. These > tasks make progress but extremely slowly. > When logging onto the machine running this TM I can see that all threads are > TIMED_WAITING . > Could there be a synchronization problem? > See attachment for a screenshot of the Flink UI and the stack below. > > *{{$ jstack 9183 | grep -A 15 "DataSetFilterJob"}}* > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (66/150)" #155 prio=5 os_prio=0 > tid=0x7faa5c01c000 nid=0x248c waiting on condition [0x7fa9d15d5000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007bfa89578> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:80)}} > {{--}} > {{"CHAIN Join (Join at with(JoinOperator.java:543)) -> Map (Map at > (DataSetFilterJob.java:67)) (65/150)" #154 prio=5 os_prio=0 > tid=0x7faa5c01b000 nid=0x248b waiting on condition [0x7fa9d14d4000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x0007b8e0eb50> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)}} > {{ at > java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:98)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockReader.getNextReturnedBlock(AsynchronousBlockReader.java:43)}} > {{ at > org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:228)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.advance(AbstractPagedInputView.java:158)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readByte(AbstractPagedInputView.java:271)}} > {{ at > org.apache.flink.runtime.memory.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:278)}} > {{ at org.apache.flink.types.StringValue.readString(StringValue.java:746)}} > {{ at > org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)}} > {{ at >
[jira] [Commented] (FLINK-17912) KafkaShuffleITCase.testAssignedToPartitionEventTime: "Watermark should always increase"
[ https://issues.apache.org/jira/browse/FLINK-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336963#comment-17336963 ] Flink Jira Bot commented on FLINK-17912: This issue was labeled "stale-critical" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Critical, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > KafkaShuffleITCase.testAssignedToPartitionEventTime: "Watermark should always > increase" > --- > > Key: FLINK-17912 > URL: https://issues.apache.org/jira/browse/FLINK-17912 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Tests >Affects Versions: 1.11.0, 1.12.0 >Reporter: Robert Metzger >Priority: Critical > Labels: stale-critical, test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2062=logs=1fc6e7bf-633c-5081-c32a-9dea24b05730=0d9ad4c1-5629-5ffc-10dc-113ca91e23c5 > {code} > 2020-05-22T21:16:24.7188044Z > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > 2020-05-22T21:16:24.7188796Z at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147) > 2020-05-22T21:16:24.7189596Z at > org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:677) > 2020-05-22T21:16:24.7190352Z at > org.apache.flink.streaming.util.TestStreamEnvironment.execute(TestStreamEnvironment.java:81) > 2020-05-22T21:16:24.7191261Z at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1673) > 2020-05-22T21:16:24.7191824Z at > org.apache.flink.test.util.TestUtils.tryExecute(TestUtils.java:35) > 2020-05-22T21:16:24.7192325Z at > org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testAssignedToPartition(KafkaShuffleITCase.java:296) > 2020-05-22T21:16:24.7192962Z at > org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testAssignedToPartitionEventTime(KafkaShuffleITCase.java:126) > 2020-05-22T21:16:24.7193436Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2020-05-22T21:16:24.7193999Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2020-05-22T21:16:24.7194720Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2020-05-22T21:16:24.7195226Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2020-05-22T21:16:24.7195864Z at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > 2020-05-22T21:16:24.7196574Z at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2020-05-22T21:16:24.7197511Z at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > 2020-05-22T21:16:24.7198020Z at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2020-05-22T21:16:24.7198494Z at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > 2020-05-22T21:16:24.7199128Z at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > 2020-05-22T21:16:24.7199689Z at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > 2020-05-22T21:16:24.7200308Z at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > 2020-05-22T21:16:24.7200645Z at java.lang.Thread.run(Thread.java:748) > 2020-05-22T21:16:24.7201029Z Caused by: > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > 2020-05-22T21:16:24.7201643Z at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) > 2020-05-22T21:16:24.7202275Z at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) > 2020-05-22T21:16:24.7202863Z at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) > 2020-05-22T21:16:24.7203525Z at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) > 2020-05-22T21:16:24.7204072Z at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) > 2020-05-22T21:16:24.7204618Z at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) > 2020-05-22T21:16:24.7205255Z at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:386) > 2020-05-22T21:16:24.7205716Z at >
[jira] [Updated] (FLINK-9268) RockDB errors from WindowOperator
[ https://issues.apache.org/jira/browse/FLINK-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-9268: -- Labels: auto-deprioritized-major (was: stale-major) > RockDB errors from WindowOperator > - > > Key: FLINK-9268 > URL: https://issues.apache.org/jira/browse/FLINK-9268 > Project: Flink > Issue Type: Bug > Components: API / DataStream, Runtime / State Backends >Affects Versions: 1.4.2 >Reporter: Narayanan Arunachalam >Priority: Major > Labels: auto-deprioritized-major > > The job has no sinks, one Kafka source, does a windowing based on session and > uses processing time. The job fails with the error given below after running > for few hours. The only way to recover from this error is to cancel the job > and start a new one. > Using S3 backend for externalized checkpoints. > A representative job DAG: > val streams = sEnv > .addSource(makeKafkaSource(config)) > .map(makeEvent) > .keyBy(_.get(EVENT_GROUP_ID)) > .window(ProcessingTimeSessionWindows.withGap(Time.seconds(60))) > .trigger(PurgingTrigger.of(ProcessingTimeTrigger.create())) > .apply(makeEventsList) > .addSink(makeNoOpSink) > A representative config: > state.backend=rocksDB > checkpoint.enabled=true > external.checkpoint.enabled=true > checkpoint.mode=AT_LEAST_ONCE > checkpoint.interval=90 > checkpoint.timeout=30 > Error: > TimerException\{java.lang.NegativeArraySizeException} > at > org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:252) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NegativeArraySizeException > at org.rocksdb.RocksDB.get(Native Method) > at org.rocksdb.RocksDB.get(RocksDB.java:810) > at > org.apache.flink.contrib.streaming.state.RocksDBListState.get(RocksDBListState.java:86) > at > org.apache.flink.contrib.streaming.state.RocksDBListState.get(RocksDBListState.java:49) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onProcessingTime(WindowOperator.java:496) > at > org.apache.flink.streaming.api.operators.HeapInternalTimerService.onProcessingTime(HeapInternalTimerService.java:255) > at > org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:249) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-6354) Add documentation for migration away from Checkpointed interface
[ https://issues.apache.org/jira/browse/FLINK-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336765#comment-17336765 ] Flink Jira Bot commented on FLINK-6354: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Add documentation for migration away from Checkpointed interface > > > Key: FLINK-6354 > URL: https://issues.apache.org/jira/browse/FLINK-6354 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Documentation >Affects Versions: 1.2.0, 1.2.1, 1.3.0 >Reporter: Aljoscha Krettek >Priority: Major > Labels: stale-major > > This should follow the procedure outlined in FLINK-6353. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-12525) Impose invariant StreamExecutionEnvironment.setBufferTimeout > 0
[ https://issues.apache.org/jira/browse/FLINK-12525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-12525: --- Priority: Minor (was: Major) > Impose invariant StreamExecutionEnvironment.setBufferTimeout > 0 > > > Key: FLINK-12525 > URL: https://issues.apache.org/jira/browse/FLINK-12525 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Runtime / Network >Reporter: Robert Stoll >Priority: Minor > Labels: auto-deprioritized-major > > The documentation for the [DataStream > API|https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/docs/dev/datastream_api.md#controlling-latency] > states: > {quote}buffer timeout of 0 should be avoided, because it can cause severe > performance degradation. > {quote} > I don't know if the documentation is not appropriate and there are valid > cases where a timeout of 0 makes sense. But if not, then the invariant should > not be > {code} > if (timeoutMillis < -1) { > throw new IllegalArgumentException("Timeout of buffer must be non-negative > or -1"); > } > {code} > But {{timeoutMillis < 0}} (can also be a second invariant) > IMO it is bad practice to state it only in the documentation. The API should > guide the user in this case (in this sense a second invariant stating the > quote above would make more sense). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-7271) ExpressionReducer does not optimize string-to-time conversion
[ https://issues.apache.org/jira/browse/FLINK-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-7271: -- Priority: Minor (was: Major) > ExpressionReducer does not optimize string-to-time conversion > - > > Key: FLINK-7271 > URL: https://issues.apache.org/jira/browse/FLINK-7271 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.3.1 >Reporter: Timo Walther >Priority: Minor > Labels: auto-deprioritized-major > > Expressions like {{"1996-11-10".toDate}} or {{"1996-11-10 > 12:12:12".toTimestamp}} are not recognized by the ExpressionReducer and are > evaluated during runtime instead of pre-flight phase. In order to optimize > the runtime we should allow constant expression reduction here. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-8179) Convert CharSequence to String when registering / converting a Table
[ https://issues.apache.org/jira/browse/FLINK-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-8179: -- Labels: auto-deprioritized-major (was: stale-major) > Convert CharSequence to String when registering / converting a Table > > > Key: FLINK-8179 > URL: https://issues.apache.org/jira/browse/FLINK-8179 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.5.0 >Reporter: Fabian Hueske >Priority: Major > Labels: auto-deprioritized-major > > Avro objects store text values as `java.lang.CharSequence`. Right now, these > fields are treated as `ANY` type by Calcite (`GenericType` by Flink). So, > importing a `DataStream` or `DataSet` of Avro objects results in Table where > the text values cannot be used as String fields. > We should convert `CharSequence` fields to `String` when importing / > converting to a Table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-5429) Code generate types between operators in Table API
[ https://issues.apache.org/jira/browse/FLINK-5429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-5429: -- Priority: Minor (was: Major) > Code generate types between operators in Table API > -- > > Key: FLINK-5429 > URL: https://issues.apache.org/jira/browse/FLINK-5429 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Legacy Planner >Reporter: Timo Walther >Priority: Minor > Labels: auto-deprioritized-major > > Currently, the Table API uses the generic Row type for shipping records > between operators in underlying DataSet and DataStream API. For efficiency > reasons we should code generate those records. The final design is up for > discussion but here are some ideas: > A row like {{(a: INT NULL, b: INT NOT NULL, c: STRING)}} could look like > {code} > final class GeneratedRow$123 { > public boolean a_isNull; > public int a; > public int b; > public String c; > } > {code} > Types could be generated using Janino in the pre-flight phase. The generated > types should use primitive types wherever possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-9698) "case class must be static and globally accessible" is too constrained
[ https://issues.apache.org/jira/browse/FLINK-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336615#comment-17336615 ] Flink Jira Bot commented on FLINK-9698: --- This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > "case class must be static and globally accessible" is too constrained > -- > > Key: FLINK-9698 > URL: https://issues.apache.org/jira/browse/FLINK-9698 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System >Reporter: Jeff Zhang >Priority: Major > Labels: stale-major > > The following code can reproduce this issue. > {code} > object BatchJob { > def main(args: Array[String]) { > // set up the batch execution environment > val env = ExecutionEnvironment.getExecutionEnvironment > val tenv = TableEnvironment.getTableEnvironment(env) > case class Person(id:Int, name:String) > val ds = env.fromElements(Person(1,"jeff"), Person(2, "andy")) > tenv.registerDataSet("table_1", ds); > } > } > {code} > Although I have workaround to declare case class outside of the main method, > this workaround won't work in scala-shell. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-11063) Make flink-table Scala-free
[ https://issues.apache.org/jira/browse/FLINK-11063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11063: --- Labels: auto-deprioritized-major (was: stale-major) > Make flink-table Scala-free > --- > > Key: FLINK-11063 > URL: https://issues.apache.org/jira/browse/FLINK-11063 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Affects Versions: 1.7.0 >Reporter: Timo Walther >Priority: Major > Labels: auto-deprioritized-major > > Currently, the Table & SQL API is implemented in Scala. This decision was > made a long-time ago when the initial code base was created as part of a > master's thesis. The community kept Scala because of the nice language > features that enable a fluent Table API like {{table.select('field.trim())}} > and because Scala allows for quick prototyping (e.g. multi-line comments for > code generation). The committers enforced not splitting the code-base into > two programming languages. > However, nowadays the {{flink-table}} module more and more becomes an > important part in the Flink ecosystem. Connectors, formats, and SQL client > are actually implemented in Java but need to interoperate with > {{flink-table}} which makes these modules dependent on Scala. As mentioned in > an earlier mail thread, using Scala for API classes also exposes member > variables and methods in Java that should not be exposed to users. Java is > still the most important API language and right now we treat it as a > second-class citizen. > In order to not introduce more technical debt, the community aims to make the > {{flink-table}} module Scala-free as a long-term goal. This will be a > continuous effort that can not be finished within one release. We aim for > avoiding API-breaking changes. > A full description can be found in the corresponding > [FLIP-28|https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free]. > FLIP-28 also contains a rough roadmap and serves as migration guidelines. > This Jira issue is an umbrella issue for tracking the efforts and possible > migration blockers. > *+Update+*: Due to the big code contribution of Alibaba into Flink SQL. We > will only perform porting of API classes for now. This is mostly tracked by > FLINK-11448. > FLIP-28 is legacy and has been integrated into > [FLIP-32|https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions]. -- This message was sent by Atlassian Jira (v8.3.4#803005)