[jira] [Resolved] (FLINK-4879) class KafkaTableSource should be public just like KafkaTableSink
[ https://issues.apache.org/jira/browse/FLINK-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-4879. --- Resolution: Fixed Fix Version/s: (was: 1.1.4) 1.2.0 Resolved for 1.2 in http://git-wip-us.apache.org/repos/asf/flink/commit/e3324372 > class KafkaTableSource should be public just like KafkaTableSink > > > Key: FLINK-4879 > URL: https://issues.apache.org/jira/browse/FLINK-4879 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Table API & SQL >Affects Versions: 1.1.1, 1.1.3 >Reporter: yuemeng >Priority: Minor > Fix For: 1.2.0 > > Attachments: 0001-class-KafkaTableSource-should-be-public.patch > > > *class KafkaTableSource should be public just like KafkaTableSink,by > default,it's modifier is default ,and we cann't access out of it's package*, > for example: > {code} > def createKafkaTableSource( > topic: String, > properties: Properties, > deserializationSchema: DeserializationSchema[Row], > fieldsNames: Array[String], > typeInfo: Array[TypeInformation[_]]): KafkaTableSource = { > if (deserializationSchema != null) { > new Kafka09TableSource(topic, properties, deserializationSchema, > fieldsNames, typeInfo) > } else { > new Kafka09JsonTableSource(topic, properties, fieldsNames, typeInfo) > } > } > {code} > Because of the class KafkaTableSource modifier is default,we cann't define > this function result type with KafkaTableSource ,we must give the specific > type. > if some other kafka source extends KafkaTableSource ,and we don't sure which > subclass of KafkaTableSource should be use,how can we specific the type? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4876) Allow web interface to be bound to a specific ip/interface/inetHost
[ https://issues.apache.org/jira/browse/FLINK-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4876: -- Assignee: Bram Vogelaar > Allow web interface to be bound to a specific ip/interface/inetHost > --- > > Key: FLINK-4876 > URL: https://issues.apache.org/jira/browse/FLINK-4876 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Affects Versions: 1.2.0, 1.1.2, 1.1.3 >Reporter: Bram Vogelaar >Assignee: Bram Vogelaar >Priority: Minor > > Currently the web interface automatically binds to all interfaces on 0.0.0.0. > IMHO there are some use cases to only bind to a specific ipadress, (e.g. > access through an authenticated proxy, not binding on the management or > backup interface) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4876) Allow web interface to be bound to a specific ip/interface/inetHost
[ https://issues.apache.org/jira/browse/FLINK-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602064#comment-15602064 ] Robert Metzger commented on FLINK-4876: --- Thank you for working on this. I gave you "Contributor" permissions in our JIRA so that you can assign issues yourself. (I've already assigned this one to you) > Allow web interface to be bound to a specific ip/interface/inetHost > --- > > Key: FLINK-4876 > URL: https://issues.apache.org/jira/browse/FLINK-4876 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Affects Versions: 1.2.0, 1.1.2, 1.1.3 >Reporter: Bram Vogelaar >Priority: Minor > > Currently the web interface automatically binds to all interfaces on 0.0.0.0. > IMHO there are some use cases to only bind to a specific ipadress, (e.g. > access through an authenticated proxy, not binding on the management or > backup interface) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4905) Kafka test instability IllegalStateException: Client is not started
Robert Metzger created FLINK-4905: - Summary: Kafka test instability IllegalStateException: Client is not started Key: FLINK-4905 URL: https://issues.apache.org/jira/browse/FLINK-4905 Project: Flink Issue Type: Bug Components: Kafka Connector Reporter: Robert Metzger The following travis build (https://s3.amazonaws.com/archive.travis-ci.org/jobs/170365439/log.txt) failed because of this error {code} 08:17:11,239 INFO org.apache.flink.runtime.jobmanager.JobManager - Status of job 33ebdc0e7c91be186d80658ce3d17069 (Read some records to commit offsets to Kafka) changed to FAILING. java.lang.RuntimeException: Error while confirming checkpoint at org.apache.flink.runtime.taskmanager.Task$4.run(Task.java:1040) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Client is not started at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113) at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141) at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99) at org.apache.flink.streaming.connectors.kafka.internals.ZookeeperOffsetHandler.setOffsetInZooKeeper(ZookeeperOffsetHandler.java:133) at org.apache.flink.streaming.connectors.kafka.internals.ZookeeperOffsetHandler.prepareAndCommitOffsets(ZookeeperOffsetHandler.java:93) at org.apache.flink.streaming.connectors.kafka.internals.Kafka08Fetcher.commitInternalOffsetsToKafka(Kafka08Fetcher.java:341) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.notifyCheckpointComplete(FlinkKafkaConsumerBase.java:421) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyOfCompletedCheckpoint(AbstractUdfStreamOperator.java:229) at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:571) at org.apache.flink.runtime.taskmanager.Task$4.run(Task.java:1035) ... 5 more 08:17:11,241 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to cancel task Source: Custom Source -> Map -> Map -> Sink: Unnamed (1/3) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-4050) FlinkKafkaProducer API Refactor
[ https://issues.apache.org/jira/browse/FLINK-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger reassigned FLINK-4050: - Assignee: Robert Metzger > FlinkKafkaProducer API Refactor > --- > > Key: FLINK-4050 > URL: https://issues.apache.org/jira/browse/FLINK-4050 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Affects Versions: 1.0.3 >Reporter: Elias Levy >Assignee: Robert Metzger > > The FlinkKafkaProducer API seems more difficult to use than it should be. > The API requires you pass it a SerializationSchema or a > KeyedSerializationSchema, but the Kafka producer already has a serialization > API. Requiring a serializer in the Flink API precludes the use of the Kafka > serializers. For instance, they preclude the use of the Confluent > KafkaAvroSerializer class that makes use of the Confluent Schema Registry. > Ideally, the serializer would be optional, so as to allow the Kafka producer > serializers to handle the task. > In addition, the KeyedSerializationSchema conflates message key extraction > with key serialization. If the serializer were optional, to allow the Kafka > producer serializers to take over, you'd still need to extract a key from the > message. > And given that the key may not be part of the message you want to write to > Kafka, an upstream step may have to package the key with the message to make > both available to the sink, for instance in a tuple. That means you also need > to define a method to extract the message to write to Kafka from the element > passed into the sink by Flink. > In summary, there should be separation of extraction of the key and message > from the element passed into the sink from serialization, and the > serialization step should be optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4905) Kafka test instability IllegalStateException: Client is not started
[ https://issues.apache.org/jira/browse/FLINK-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608006#comment-15608006 ] Robert Metzger commented on FLINK-4905: --- I'm not 100% sure if this can happen, because {{close()}} and {{notifyCheckpointComplete}} are already synchronized. The lock for close is here: https://github.com/apache/flink/blob/770f2f83a81b2810aff171b2f56390ef686f725a/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L279 The lock for the notify is here: https://github.com/apache/flink/blob/770f2f83a81b2810aff171b2f56390ef686f725a/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L565 > Kafka test instability IllegalStateException: Client is not started > --- > > Key: FLINK-4905 > URL: https://issues.apache.org/jira/browse/FLINK-4905 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Reporter: Robert Metzger > Labels: test-stability > > The following travis build > (https://s3.amazonaws.com/archive.travis-ci.org/jobs/170365439/log.txt) > failed because of this error > {code} > 08:17:11,239 INFO org.apache.flink.runtime.jobmanager.JobManager >- Status of job 33ebdc0e7c91be186d80658ce3d17069 (Read some records to > commit offsets to Kafka) changed to FAILING. > java.lang.RuntimeException: Error while confirming checkpoint > at org.apache.flink.runtime.taskmanager.Task$4.run(Task.java:1040) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Client is not started > at > org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) > at > org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113) > at > org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141) > at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99) > at > org.apache.flink.streaming.connectors.kafka.internals.ZookeeperOffsetHandler.setOffsetInZooKeeper(ZookeeperOffsetHandler.java:133) > at > org.apache.flink.streaming.connectors.kafka.internals.ZookeeperOffsetHandler.prepareAndCommitOffsets(ZookeeperOffsetHandler.java:93) > at > org.apache.flink.streaming.connectors.kafka.internals.Kafka08Fetcher.commitInternalOffsetsToKafka(Kafka08Fetcher.java:341) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.notifyCheckpointComplete(FlinkKafkaConsumerBase.java:421) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyOfCompletedCheckpoint(AbstractUdfStreamOperator.java:229) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:571) > at org.apache.flink.runtime.taskmanager.Task$4.run(Task.java:1035) > ... 5 more > 08:17:11,241 INFO org.apache.flink.runtime.taskmanager.Task >- Attempting to cancel task Source: Custom Source -> Map -> Map -> Sink: > Unnamed (1/3) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4941) Show ship strategy in web interface
Robert Metzger created FLINK-4941: - Summary: Show ship strategy in web interface Key: FLINK-4941 URL: https://issues.apache.org/jira/browse/FLINK-4941 Project: Flink Issue Type: Bug Components: Webfrontend Reporter: Robert Metzger Assignee: Robert Metzger Currently, only Flink's Plan visualizer shows the ship strategy of a streaming job, however not web interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4941) Show ship strategy in web interface
[ https://issues.apache.org/jira/browse/FLINK-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4941: -- Attachment: ship_strategy_bad.png ship_strategy_good.png I attached two screenshots > Show ship strategy in web interface > --- > > Key: FLINK-4941 > URL: https://issues.apache.org/jira/browse/FLINK-4941 > Project: Flink > Issue Type: Bug > Components: Webfrontend >Reporter: Robert Metzger >Assignee: Robert Metzger > Attachments: ship_strategy_bad.png, ship_strategy_good.png > > > Currently, only Flink's Plan visualizer shows the ship strategy of a > streaming job, however not web interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-4973) Flakey Yarn tests due to recently added latency marker
[ https://issues.apache.org/jira/browse/FLINK-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger reassigned FLINK-4973: - Assignee: Robert Metzger > Flakey Yarn tests due to recently added latency marker > -- > > Key: FLINK-4973 > URL: https://issues.apache.org/jira/browse/FLINK-4973 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Assignee: Robert Metzger >Priority: Critical > Labels: test-stability > Fix For: 1.2.0 > > > The newly introduced {{LatencyMarksEmitter}} emits latency marker on the > {{Output}}. This can still happen after the underlying {{BufferPool}} has > been destroyed. The occurring exception is then logged: > {code} > 2016-10-29 15:00:48,088 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Custom File Source (1/1) switched to FINISHED > 2016-10-29 15:00:48,089 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Source: Custom File Source (1/1) > 2016-10-29 15:00:48,089 INFO org.apache.flink.yarn.YarnTaskManager > - Un-registering task and sending final execution state > FINISHED to JobManager for task Source: Custom File Source > (8fe0f817fa6d960ea33f6e57e0c3891c) > 2016-10-29 15:00:48,101 WARN > org.apache.flink.streaming.api.operators.AbstractStreamOperator - Error > while emitting latency marker > java.lang.RuntimeException: Buffer pool is destroyed. > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:99) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.emitLatencyMarker(AbstractStreamOperator.java:734) > at > org.apache.flink.streaming.api.operators.StreamSource$LatencyMarksEmitter$1.run(StreamSource.java:134) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Buffer pool is destroyed. > at > org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:144) > at > org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:118) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.randomEmit(RecordWriter.java:103) > at > org.apache.flink.streaming.runtime.io.StreamRecordWriter.randomEmit(StreamRecordWriter.java:104) > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:96) > ... 9 more > {code} > This exception is clearly related to the shutdown of a stream operator and > does not indicate a wrong behaviour. Since the yarn tests simply scan the log > for some keywords (including exception) such a case can make them fail. > Best if we could make sure that the {{LatencyMarksEmitter}} would only emit > latency marker if the {{Output}} would still be active. But we could also > simply not log exceptions which occurred after the stream operator has been > stopped. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/171578846/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4973) Flakey Yarn tests due to recently added latency marker
[ https://issues.apache.org/jira/browse/FLINK-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622478#comment-15622478 ] Robert Metzger commented on FLINK-4973: --- Thank you for reporting this issue. I'll look into it. > Flakey Yarn tests due to recently added latency marker > -- > > Key: FLINK-4973 > URL: https://issues.apache.org/jira/browse/FLINK-4973 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Assignee: Robert Metzger >Priority: Critical > Labels: test-stability > Fix For: 1.2.0 > > > The newly introduced {{LatencyMarksEmitter}} emits latency marker on the > {{Output}}. This can still happen after the underlying {{BufferPool}} has > been destroyed. The occurring exception is then logged: > {code} > 2016-10-29 15:00:48,088 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Custom File Source (1/1) switched to FINISHED > 2016-10-29 15:00:48,089 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Source: Custom File Source (1/1) > 2016-10-29 15:00:48,089 INFO org.apache.flink.yarn.YarnTaskManager > - Un-registering task and sending final execution state > FINISHED to JobManager for task Source: Custom File Source > (8fe0f817fa6d960ea33f6e57e0c3891c) > 2016-10-29 15:00:48,101 WARN > org.apache.flink.streaming.api.operators.AbstractStreamOperator - Error > while emitting latency marker > java.lang.RuntimeException: Buffer pool is destroyed. > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:99) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.emitLatencyMarker(AbstractStreamOperator.java:734) > at > org.apache.flink.streaming.api.operators.StreamSource$LatencyMarksEmitter$1.run(StreamSource.java:134) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Buffer pool is destroyed. > at > org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:144) > at > org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:118) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.randomEmit(RecordWriter.java:103) > at > org.apache.flink.streaming.runtime.io.StreamRecordWriter.randomEmit(StreamRecordWriter.java:104) > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:96) > ... 9 more > {code} > This exception is clearly related to the shutdown of a stream operator and > does not indicate a wrong behaviour. Since the yarn tests simply scan the log > for some keywords (including exception) such a case can make them fail. > Best if we could make sure that the {{LatencyMarksEmitter}} would only emit > latency marker if the {{Output}} would still be active. But we could also > simply not log exceptions which occurred after the stream operator has been > stopped. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/171578846/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4974) RescalingITCase.testSavepointRescalingInPartitionedOperatorState unstable
Robert Metzger created FLINK-4974: - Summary: RescalingITCase.testSavepointRescalingInPartitionedOperatorState unstable Key: FLINK-4974 URL: https://issues.apache.org/jira/browse/FLINK-4974 Project: Flink Issue Type: Bug Reporter: Robert Metzger {code} testSavepointRescalingInPartitionedOperatorState(org.apache.flink.test.checkpointing.RescalingITCase) Time elapsed: 2.761 sec <<< FAILURE! java.lang.AssertionError: expected:<24> but was:<72> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.flink.test.checkpointing.RescalingITCase.testSavepointRescalingPartitionedOperatorState(RescalingITCase.java:560) at org.apache.flink.test.checkpointing.RescalingITCase.testSavepointRescalingInPartitionedOperatorState(RescalingITCase.java:445) {code} in https://s3.amazonaws.com/archive.travis-ci.org/jobs/171445547/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4945) KafkaConsumer logs wrong warning about confirmation for unknown checkpoint
[ https://issues.apache.org/jira/browse/FLINK-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4945: -- Component/s: Kafka Connector > KafkaConsumer logs wrong warning about confirmation for unknown checkpoint > -- > > Key: FLINK-4945 > URL: https://issues.apache.org/jira/browse/FLINK-4945 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Reporter: Stefan Richter >Assignee: Stefan Richter >Priority: Minor > Fix For: 1.2.0 > > > Checkpoints are currently not registered in all cases. While the code still > behaves correctly this leads to misleading warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4945) KafkaConsumer logs wrong warning about confirmation for unknown checkpoint
[ https://issues.apache.org/jira/browse/FLINK-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4945: -- Fix Version/s: 1.2.0 > KafkaConsumer logs wrong warning about confirmation for unknown checkpoint > -- > > Key: FLINK-4945 > URL: https://issues.apache.org/jira/browse/FLINK-4945 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Reporter: Stefan Richter >Assignee: Stefan Richter >Priority: Minor > Fix For: 1.2.0 > > > Checkpoints are currently not registered in all cases. While the code still > behaves correctly this leads to misleading warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4945) KafkaConsumer logs wrong warning about confirmation for unknown checkpoint
[ https://issues.apache.org/jira/browse/FLINK-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-4945. --- Resolution: Fixed Thank you for fixing the issue. Merged to master in http://git-wip-us.apache.org/repos/asf/flink/commit/223b0aa0 > KafkaConsumer logs wrong warning about confirmation for unknown checkpoint > -- > > Key: FLINK-4945 > URL: https://issues.apache.org/jira/browse/FLINK-4945 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Reporter: Stefan Richter >Assignee: Stefan Richter >Priority: Minor > Fix For: 1.2.0 > > > Checkpoints are currently not registered in all cases. While the code still > behaves correctly this leads to misleading warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5001) Ensure that the Kafka 0.9+ connector is compatible with kafka-consumer-groups.sh
Robert Metzger created FLINK-5001: - Summary: Ensure that the Kafka 0.9+ connector is compatible with kafka-consumer-groups.sh Key: FLINK-5001 URL: https://issues.apache.org/jira/browse/FLINK-5001 Project: Flink Issue Type: Bug Components: Kafka Connector Affects Versions: 1.1.0, 1.2.0 Reporter: Robert Metzger Priority: Critical Similarly to FLINK-4822, the offsets committed by Flink's Kafka 0.9+ consumer are not available through the {{kafka-consumer-groups.sh}} tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5001) Ensure that the Kafka 0.9+ connector is compatible with kafka-consumer-groups.sh
[ https://issues.apache.org/jira/browse/FLINK-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632274#comment-15632274 ] Robert Metzger commented on FLINK-5001: --- The {{kafka.admin.ConsumerGroupCommand}} object contains the implementation of the {{kafka-consumer-groups.sh}} utility. I ran it with a debugger attached, and it seems that the broker is not returning all consumer groups to the client. > Ensure that the Kafka 0.9+ connector is compatible with > kafka-consumer-groups.sh > > > Key: FLINK-5001 > URL: https://issues.apache.org/jira/browse/FLINK-5001 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Affects Versions: 1.1.0, 1.2.0 >Reporter: Robert Metzger >Priority: Blocker > > Similarly to FLINK-4822, the offsets committed by Flink's Kafka 0.9+ consumer > are not available through the {{kafka-consumer-groups.sh}} tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5001) Ensure that the Kafka 0.9+ connector is compatible with kafka-consumer-groups.sh
[ https://issues.apache.org/jira/browse/FLINK-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15632307#comment-15632307 ] Robert Metzger commented on FLINK-5001: --- Okay, this is a limitation of the {{KafkaConsumer}}. Since we are manually assigning the partitions to consume, we are not participating in the group-balancing mechanism. http://grokbase.com/t/kafka/users/163rrq9ne8/new-consumer-group-not-showing-up > Ensure that the Kafka 0.9+ connector is compatible with > kafka-consumer-groups.sh > > > Key: FLINK-5001 > URL: https://issues.apache.org/jira/browse/FLINK-5001 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Affects Versions: 1.1.0, 1.2.0 >Reporter: Robert Metzger >Priority: Blocker > > Similarly to FLINK-4822, the offsets committed by Flink's Kafka 0.9+ consumer > are not available through the {{kafka-consumer-groups.sh}} tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-5001) Ensure that the Kafka 0.9+ connector is compatible with kafka-consumer-groups.sh
[ https://issues.apache.org/jira/browse/FLINK-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger reassigned FLINK-5001: - Assignee: Robert Metzger > Ensure that the Kafka 0.9+ connector is compatible with > kafka-consumer-groups.sh > > > Key: FLINK-5001 > URL: https://issues.apache.org/jira/browse/FLINK-5001 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Affects Versions: 1.1.0, 1.2.0 >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Blocker > > Similarly to FLINK-4822, the offsets committed by Flink's Kafka 0.9+ consumer > are not available through the {{kafka-consumer-groups.sh}} tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5013) Flink Kinesis connector doesn't work on old EMR versions
Robert Metzger created FLINK-5013: - Summary: Flink Kinesis connector doesn't work on old EMR versions Key: FLINK-5013 URL: https://issues.apache.org/jira/browse/FLINK-5013 Project: Flink Issue Type: Bug Components: Kinesis Connector Reporter: Robert Metzger A user reported on the mailing list that our Kinesis connector doesn't work with EMR 4.4.0: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-Connector-Dependency-Problems-td9790.html The problem seems to be that Flink is loading older libraries from the "YARN container classpath", which on EMR contains the default Amazon libraries. We should try to shade kinesis and its amazon dependencies into a different namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4221) Show metrics in WebFrontend
[ https://issues.apache.org/jira/browse/FLINK-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-4221. --- Resolution: Fixed Fix Version/s: 1.2.0 Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/3a4fc537 > Show metrics in WebFrontend > --- > > Key: FLINK-4221 > URL: https://issues.apache.org/jira/browse/FLINK-4221 > Project: Flink > Issue Type: Sub-task > Components: Metrics, Webfrontend >Affects Versions: 1.0.0 >Reporter: Chesnay Schepler >Assignee: Robert Metzger > Fix For: 1.2.0, pre-apache > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5731) Split up CI builds
[ https://issues.apache.org/jira/browse/FLINK-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860971#comment-15860971 ] Robert Metzger commented on FLINK-5731: --- We dropped Hadoop 1 support just recently with the 1.2 release, so we have a tradition of supporting older Hadoop versions for a long time. The reasoning is that we don't want to exclude any new users because of their Hadoop version. As long as supporting old Hadoop versions doesn't add too much pain, we'll probably keep them. If you want to really change it, you would probably need to start a discussion on the dev@ mailing list about it. > Split up CI builds > -- > > Key: FLINK-5731 > URL: https://issues.apache.org/jira/browse/FLINK-5731 > Project: Flink > Issue Type: Improvement > Components: Build System, Tests >Reporter: Ufuk Celebi >Assignee: Robert Metzger >Priority: Critical > > Test builds regularly time out because we are hitting the Travis 50 min > limit. Previously, we worked around this by splitting up the tests into > groups. I think we have to split them further. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5316) Make the GenericWriteAheadSink backwards compatible.
[ https://issues.apache.org/jira/browse/FLINK-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862757#comment-15862757 ] Robert Metzger commented on FLINK-5316: --- I would suggest to not implement this, unless a user asks for it. > Make the GenericWriteAheadSink backwards compatible. > > > Key: FLINK-5316 > URL: https://issues.apache.org/jira/browse/FLINK-5316 > Project: Flink > Issue Type: Improvement > Components: Cassandra Connector >Affects Versions: 1.2.0 >Reporter: Kostas Kloudas >Assignee: Kostas Kloudas > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (FLINK-3985) A metric with the name * was already registered
[ https://issues.apache.org/jira/browse/FLINK-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-3985. --- Resolution: Fixed The issue is fixed in the YARN tests. > A metric with the name * was already registered > --- > > Key: FLINK-3985 > URL: https://issues.apache.org/jira/browse/FLINK-3985 > Project: Flink > Issue Type: Bug > Components: Metrics >Affects Versions: 1.1.0 >Reporter: Robert Metzger >Assignee: Stephan Ewen > Labels: test-stability > > The YARN tests detected the following failure while running WordCount. > {code} > 2016-05-27 21:50:48,230 INFO org.apache.flink.yarn.YarnTaskManager > - Received task CHAIN DataSource (at main(WordCount.java:70) > (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at > main(WordCount.java:80)) -> Combine(SUM(1), at main(WordCount.java:83) (1/2) > 2016-05-27 21:50:48,231 ERROR org.apache.flink.metrics.reporter.JMXReporter > - A metric with the name > org.apache.flink.metrics:key0=testing-worker-linux-docker-6e03e1e8-3385-linux-1,key1=taskmanager,key2=ee7c10183f32c9a96f8e7cfd873863d1,key3=WordCount_Example,key4=CHAIN_DataSource_(at_main(WordCount.java-70)_(org.apache.flink.api.java.io.TextInputFormat))_->_FlatMap_(FlatMap_at_main(WordCount.java-80))_->_Combine(SUM(1)-_at_main(WordCount.java-83),name=numBytesIn > was already registered. > javax.management.InstanceAlreadyExistsException: > org.apache.flink.metrics:key0=testing-worker-linux-docker-6e03e1e8-3385-linux-1,key1=taskmanager,key2=ee7c10183f32c9a96f8e7cfd873863d1,key3=WordCount_Example,key4=CHAIN_DataSource_(at_main(WordCount.java-70)_(org.apache.flink.api.java.io.TextInputFormat))_->_FlatMap_(FlatMap_at_main(WordCount.java-80))_->_Combine(SUM(1)-_at_main(WordCount.java-83),name=numBytesIn > at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) > at > org.apache.flink.metrics.reporter.JMXReporter.notifyOfAddedMetric(JMXReporter.java:76) > at > org.apache.flink.metrics.MetricRegistry.register(MetricRegistry.java:177) > at > org.apache.flink.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:191) > at > org.apache.flink.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:144) > at > org.apache.flink.metrics.groups.IOMetricGroup.(IOMetricGroup.java:40) > at > org.apache.flink.metrics.groups.TaskMetricGroup.(TaskMetricGroup.java:74) > at > org.apache.flink.metrics.groups.JobMetricGroup.addTask(JobMetricGroup.java:74) > at > org.apache.flink.metrics.groups.TaskManagerMetricGroup.addTaskForJob(TaskManagerMetricGroup.java:86) > at > org.apache.flink.runtime.taskmanager.TaskManager.submitTask(TaskManager.scala:1093) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleTaskMessage(TaskManager.scala:442) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:284) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundRece
[jira] [Commented] (FLINK-5690) protobuf is not shaded properly
[ https://issues.apache.org/jira/browse/FLINK-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862821#comment-15862821 ] Robert Metzger commented on FLINK-5690: --- You are right. My project contained the protobuf dependency. Are you okay with closing this JIRA as "won't fix"? We can currently not shade protobuf away, due to our dependency to Akka. > protobuf is not shaded properly > --- > > Key: FLINK-5690 > URL: https://issues.apache.org/jira/browse/FLINK-5690 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.1.4, 1.3.0 >Reporter: Andrey >Assignee: Robert Metzger > > Currently distributive contains com/google/protobuf package. Without proper > shading client code could fail with: > {code} > Caused by: java.lang.IllegalAccessError: tried to access method > com.google.protobuf. > {code} > Steps to reproduce: > * create job class "com.google.protobuf.TestClass" > * call com.google.protobuf.TextFormat.escapeText(String) method from this > class > * deploy job to flink cluster (usign web console for example) > * run job. In logs IllegalAccessError. > Issue in package protected method and different classloaders. TestClass > loaded by FlinkUserCodeClassLoader, but TextFormat class loaded by > sun.misc.Launcher$AppClassLoader -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5690) protobuf is not shaded properly
[ https://issues.apache.org/jira/browse/FLINK-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864266#comment-15864266 ] Robert Metzger commented on FLINK-5690: --- 1) I'll open a pull request. Regarding 2) This is something we could look into. However, so far, almost all complaints from users were about the dependencies we inherit from Hadoop. That's why I've tried relocating all of Hadoop's dependencies: https://issues.apache.org/jira/browse/FLINK-5297?focusedCommentId=15815050&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15815050 (with mixed results). Also, I'm planning to provide a "hadoop free" version of Flink. As more and more of our users actually run without anything from Hadoop (for example on AWS with S3 and Docker), these users don't need all the deps we inherit from Hadoop. The libraries you've mentioned are pretty good with maintaining API compatibility. So you can mix these libraries better in one classpath. > protobuf is not shaded properly > --- > > Key: FLINK-5690 > URL: https://issues.apache.org/jira/browse/FLINK-5690 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.1.4, 1.3.0 >Reporter: Andrey >Assignee: Robert Metzger > > Currently distributive contains com/google/protobuf package. Without proper > shading client code could fail with: > {code} > Caused by: java.lang.IllegalAccessError: tried to access method > com.google.protobuf. > {code} > Steps to reproduce: > * create job class "com.google.protobuf.TestClass" > * call com.google.protobuf.TextFormat.escapeText(String) method from this > class > * deploy job to flink cluster (usign web console for example) > * run job. In logs IllegalAccessError. > Issue in package protected method and different classloaders. TestClass > loaded by FlinkUserCodeClassLoader, but TextFormat class loaded by > sun.misc.Launcher$AppClassLoader -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5690) protobuf is not shaded properly
[ https://issues.apache.org/jira/browse/FLINK-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864312#comment-15864312 ] Robert Metzger commented on FLINK-5690: --- Feel free to review my pull request. > protobuf is not shaded properly > --- > > Key: FLINK-5690 > URL: https://issues.apache.org/jira/browse/FLINK-5690 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.1.4, 1.3.0 >Reporter: Andrey >Assignee: Robert Metzger > > Currently distributive contains com/google/protobuf package. Without proper > shading client code could fail with: > {code} > Caused by: java.lang.IllegalAccessError: tried to access method > com.google.protobuf. > {code} > Steps to reproduce: > * create job class "com.google.protobuf.TestClass" > * call com.google.protobuf.TextFormat.escapeText(String) method from this > class > * deploy job to flink cluster (usign web console for example) > * run job. In logs IllegalAccessError. > Issue in package protected method and different classloaders. TestClass > loaded by FlinkUserCodeClassLoader, but TextFormat class loaded by > sun.misc.Launcher$AppClassLoader -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5736) Fix Javadocs / Scaladocs build (Feb 2017)
[ https://issues.apache.org/jira/browse/FLINK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865758#comment-15865758 ] Robert Metzger commented on FLINK-5736: --- I've fixed the javadocs for the 1.2 build: https://ci.apache.org/projects/flink/flink-docs-release-1.2/api/java/ Tonight, it'll also be rebuild for the other versions. > Fix Javadocs / Scaladocs build (Feb 2017) > - > > Key: FLINK-5736 > URL: https://issues.apache.org/jira/browse/FLINK-5736 > Project: Flink > Issue Type: Task > Components: Build System, Documentation >Reporter: Robert Metzger > > It looks like the scaladocs are not building properly. > Expected URL: > https://ci.apache.org/projects/flink/flink-docs-release-1.2/api/java/ > Buildbot output: > https://ci.apache.org/builders/flink-docs-release-1.2/builds/15/steps/Java%20%26%20Scala%20docs/logs/stdio > Command to reproduce issue on master: mvn clean javadoc:aggregate > -Paggregate-scaladoc -DadditionalJOption="-Xdoclint:none" -Dheader=" href="http://flink.apache.org/"; target="_top">Back to Flink > Website" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (FLINK-5736) Fix Javadocs / Scaladocs build (Feb 2017)
[ https://issues.apache.org/jira/browse/FLINK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger reassigned FLINK-5736: - Assignee: Robert Metzger > Fix Javadocs / Scaladocs build (Feb 2017) > - > > Key: FLINK-5736 > URL: https://issues.apache.org/jira/browse/FLINK-5736 > Project: Flink > Issue Type: Task > Components: Build System, Documentation >Reporter: Robert Metzger >Assignee: Robert Metzger > > It looks like the scaladocs are not building properly. > Expected URL: > https://ci.apache.org/projects/flink/flink-docs-release-1.2/api/java/ > Buildbot output: > https://ci.apache.org/builders/flink-docs-release-1.2/builds/15/steps/Java%20%26%20Scala%20docs/logs/stdio > Command to reproduce issue on master: mvn clean javadoc:aggregate > -Paggregate-scaladoc -DadditionalJOption="-Xdoclint:none" -Dheader=" href="http://flink.apache.org/"; target="_top">Back to Flink > Website" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (FLINK-5736) Fix Javadocs / Scaladocs build (Feb 2017)
[ https://issues.apache.org/jira/browse/FLINK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-5736. --- Resolution: Fixed > Fix Javadocs / Scaladocs build (Feb 2017) > - > > Key: FLINK-5736 > URL: https://issues.apache.org/jira/browse/FLINK-5736 > Project: Flink > Issue Type: Task > Components: Build System, Documentation >Reporter: Robert Metzger >Assignee: Robert Metzger > > It looks like the scaladocs are not building properly. > Expected URL: > https://ci.apache.org/projects/flink/flink-docs-release-1.2/api/java/ > Buildbot output: > https://ci.apache.org/builders/flink-docs-release-1.2/builds/15/steps/Java%20%26%20Scala%20docs/logs/stdio > Command to reproduce issue on master: mvn clean javadoc:aggregate > -Paggregate-scaladoc -DadditionalJOption="-Xdoclint:none" -Dheader=" href="http://flink.apache.org/"; target="_top">Back to Flink > Website" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5736) Fix Javadocs / Scaladocs build (Feb 2017)
[ https://issues.apache.org/jira/browse/FLINK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865766#comment-15865766 ] Robert Metzger commented on FLINK-5736: --- The problem was the following: On the buildbot server (configuration is located here: https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects) there is a script that generates the javadocs. The script checks if the build server (the servers are apparently not heterogeneous) has Java8 or not. If Java8 is present {{-DadditionalJOption="-Xdoclint:none"}} is passed to the maven invocation. The problem was that the java version check is not working, hence the flag was not passed. It seems that this doclint option does not only affect html errors (or wrong links etc). It also affects other issues during the javadocs compilation. With doclint, basically all javadoc errors turn into warnings. As a short term fix, I'm now always passing the doclint option. > Fix Javadocs / Scaladocs build (Feb 2017) > - > > Key: FLINK-5736 > URL: https://issues.apache.org/jira/browse/FLINK-5736 > Project: Flink > Issue Type: Task > Components: Build System, Documentation >Reporter: Robert Metzger >Assignee: Robert Metzger > > It looks like the scaladocs are not building properly. > Expected URL: > https://ci.apache.org/projects/flink/flink-docs-release-1.2/api/java/ > Buildbot output: > https://ci.apache.org/builders/flink-docs-release-1.2/builds/15/steps/Java%20%26%20Scala%20docs/logs/stdio > Command to reproduce issue on master: mvn clean javadoc:aggregate > -Paggregate-scaladoc -DadditionalJOption="-Xdoclint:none" -Dheader=" href="http://flink.apache.org/"; target="_top">Back to Flink > Website" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5690) protobuf is not shaded properly
[ https://issues.apache.org/jira/browse/FLINK-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865813#comment-15865813 ] Robert Metzger commented on FLINK-5690: --- Merged documentation update to master in http://git-wip-us.apache.org/repos/asf/flink/commit/d3228144 > protobuf is not shaded properly > --- > > Key: FLINK-5690 > URL: https://issues.apache.org/jira/browse/FLINK-5690 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.1.4, 1.3.0 >Reporter: Andrey >Assignee: Robert Metzger > > Currently distributive contains com/google/protobuf package. Without proper > shading client code could fail with: > {code} > Caused by: java.lang.IllegalAccessError: tried to access method > com.google.protobuf. > {code} > Steps to reproduce: > * create job class "com.google.protobuf.TestClass" > * call com.google.protobuf.TextFormat.escapeText(String) method from this > class > * deploy job to flink cluster (usign web console for example) > * run job. In logs IllegalAccessError. > Issue in package protected method and different classloaders. TestClass > loaded by FlinkUserCodeClassLoader, but TextFormat class loaded by > sun.misc.Launcher$AppClassLoader -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5796) broken links in the docs
[ https://issues.apache.org/jira/browse/FLINK-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865894#comment-15865894 ] Robert Metzger commented on FLINK-5796: --- Sorry, I don't have time for this right now. I think we can integrate this check into the build docs script. This way, we don't need to work on buildbot. (ideally this is put into a separate script that is called from both the build docs script and buildbot) (I can do the call in buildbot) > broken links in the docs > > > Key: FLINK-5796 > URL: https://issues.apache.org/jira/browse/FLINK-5796 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.3.0 >Reporter: Nico Kruber > > running a link checker on the locally-served flink docs yields the following > broken links (same for online docs): > {code} > 15:21:55 Error: "Not Found " (404) at link 0.0.0.0:4000/dev/state (from > 0.0.0.0:4000/dev/datastream_api.html) > 15:21:55 Error: "Not Found " (404) at link > 0.0.0.0:4000/monitoring/best_practices.html (from 0.0.0.0:4000/dev/batch/) > 15:21:55 Error: "Not Found " (404) at link > 0.0.0.0:4000/api/java/org/apache/flink/table/api/Table.html (from > 0.0.0.0:4000/dev/table_api.html) > 15:21:55 Error: "Not Found " (404) at link 0.0.0.0:4000/dev/state.html > (from 0.0.0.0:4000/ops/state_backends.html) > 15:21:55 Error: "Not Found " (404) at link > 0.0.0.0:4000/internals/state_backends.html (from > 0.0.0.0:4000/internals/stream_checkpointing.html) > {code} > FYI: command to replay: > {{httrack http://0.0.0.0:4000/ -O "$PWD/flink-docs" --testlinks -%v > --depth= --ext-depth=0}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5831) Sort metrics in metric selector and add search box
Robert Metzger created FLINK-5831: - Summary: Sort metrics in metric selector and add search box Key: FLINK-5831 URL: https://issues.apache.org/jira/browse/FLINK-5831 Project: Flink Issue Type: Improvement Components: Webfrontend Reporter: Robert Metzger The JobManager UI makes it hard to select metrics using the drop down menu. First of all, it would me nice to sort all entries. Also a search box on top of the drop down would make it much easier to find the metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-5831) Sort metrics in metric selector and add search box
[ https://issues.apache.org/jira/browse/FLINK-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5831: -- Attachment: dropDown.png I've attached the drop down I'm talking about as a screenshot. > Sort metrics in metric selector and add search box > -- > > Key: FLINK-5831 > URL: https://issues.apache.org/jira/browse/FLINK-5831 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Reporter: Robert Metzger > Attachments: dropDown.png > > > The JobManager UI makes it hard to select metrics using the drop down menu. > First of all, it would me nice to sort all entries. Also a search box on top > of the drop down would make it much easier to find the metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (FLINK-5487) Proper at-least-once support for ElasticsearchSink
[ https://issues.apache.org/jira/browse/FLINK-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger reassigned FLINK-5487: - Assignee: Tzu-Li (Gordon) Tai > Proper at-least-once support for ElasticsearchSink > -- > > Key: FLINK-5487 > URL: https://issues.apache.org/jira/browse/FLINK-5487 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Critical > > Discussion in ML: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fault-tolerance-guarantees-of-Elasticsearch-sink-in-flink-elasticsearch2-td10982.html > Currently, the Elasticsearch Sink actually doesn't offer any guarantees for > message delivery. > For proper support of at-least-once, the sink will need to participate in > Flink's checkpointing: when snapshotting is triggered at the > {{ElasticsearchSink}}, we need to synchronize on the pending ES requests by > flushing the internal bulk processor. For temporary ES failures (see > FLINK-5122) that may happen on the flush, we should retry them before > returning from snapshotting and acking the checkpoint. If there are > non-temporary ES failures on the flush, the current snapshot should fail. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5874) Reject arrays as keys in DataStream API to avoid inconsistent hashing
Robert Metzger created FLINK-5874: - Summary: Reject arrays as keys in DataStream API to avoid inconsistent hashing Key: FLINK-5874 URL: https://issues.apache.org/jira/browse/FLINK-5874 Project: Flink Issue Type: Bug Components: DataStream API Affects Versions: 1.1.4, 1.2.0 Reporter: Robert Metzger Priority: Blocker This issue has been reported on the mailing list twice: - http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Previously-working-job-fails-on-Flink-1-2-0-td11741.html - http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Arrays-values-in-keyBy-td7530.html The problem is the following: We are using just Key[].hashCode() to compute the hash when shuffling data. Java's default hashCode() implementation doesn't take the arrays contents into account, but the memory address. This leads to different hash code on the sender and receiver side. In Flink 1.1 this means that the data is shuffled randomly and not keyed, and in Flink 1.2 the keygroups code detect a violation of the hashing. The proper fix of the problem would be to rely on Flink's {{TypeComparator}} class, which has a type-specific hashing function. But introducing this change would break compatibility with existing code. I'll file a JIRA for the 2.0 changes for that fix. For 1.2.1 and 1.3.0 we should at least reject arrays as keys. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-5875) Use TypeComparator.hash() instead of Object.hashCode() for keying in DataStream API
[ https://issues.apache.org/jira/browse/FLINK-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5875: -- Summary: Use TypeComparator.hash() instead of Object.hashCode() for keying in DataStream API (was: Use TypeComparator.hash instead of Object.hashCode() for keying in DataStream API) > Use TypeComparator.hash() instead of Object.hashCode() for keying in > DataStream API > --- > > Key: FLINK-5875 > URL: https://issues.apache.org/jira/browse/FLINK-5875 > Project: Flink > Issue Type: Sub-task > Components: DataStream API >Reporter: Robert Metzger > Fix For: 2.0.0 > > > See FLINK-5874 for details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5875) Use TypeComparator.hash instead of Object.hashCode() for keying in DataStream API
Robert Metzger created FLINK-5875: - Summary: Use TypeComparator.hash instead of Object.hashCode() for keying in DataStream API Key: FLINK-5875 URL: https://issues.apache.org/jira/browse/FLINK-5875 Project: Flink Issue Type: Sub-task Components: DataStream API Reporter: Robert Metzger Fix For: 2.0.0 See FLINK-5874 for details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (FLINK-5731) Split up CI builds
[ https://issues.apache.org/jira/browse/FLINK-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-5731. --- Resolution: Fixed Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/e7a914d4 > Split up CI builds > -- > > Key: FLINK-5731 > URL: https://issues.apache.org/jira/browse/FLINK-5731 > Project: Flink > Issue Type: Improvement > Components: Build System, Tests >Reporter: Ufuk Celebi >Assignee: Robert Metzger >Priority: Critical > > Test builds regularly time out because we are hitting the Travis 50 min > limit. Previously, we worked around this by splitting up the tests into > groups. I think we have to split them further. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5898) Race-Condition with Amazon Kinesis KPL
[ https://issues.apache.org/jira/browse/FLINK-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882319#comment-15882319 ] Robert Metzger commented on FLINK-5898: --- Thank you Scott for looking into this! Fixing it at the KPL is probably the easiest. If that doesn't work, we could consider temporarily changing the "java.io.tmpdir" system property to include a random UUID. > Race-Condition with Amazon Kinesis KPL > -- > > Key: FLINK-5898 > URL: https://issues.apache.org/jira/browse/FLINK-5898 > Project: Flink > Issue Type: Bug > Components: Kinesis Connector >Affects Versions: 1.2.0 >Reporter: Scott Kidder > > The Flink Kinesis streaming-connector uses the Amazon Kinesis Producer > Library (KPL) to send messages to Kinesis streams. The KPL relies on a native > binary client to send messages to achieve better performance. > When a Kinesis Producer is instantiated, the KPL will extract the native > binary to a sub-directory of `/tmp` (or whatever the platform-specific > temporary directory happens to be). > The KPL tries to prevent multiple processes from extracting the binary at the > same time by wrapping the operation in a mutex. Unfortunately, this does not > prevent multiple Flink cores from trying to perform this operation at the > same time. If two or more processes attempt to do this at the same time, then > the native binary in /tmp will be corrupted. > The authors of the KPL are aware of this possibility and suggest that users > of the KPL not do that ... (sigh): > https://github.com/awslabs/amazon-kinesis-producer/issues/55#issuecomment-251408897 > I encountered this in my production environment when bringing up a new Flink > task-manager with multiple cores and restoring from an earlier savepoint, > resulting in the instantiation of a KPL client on each core at roughly the > same time. > A stack-trace follows: > {noformat} > java.lang.RuntimeException: Could not copy native binaries to temp directory > /tmp/amazon-kinesis-producer-native-binaries > at > com.amazonaws.services.kinesis.producer.KinesisProducer.extractBinaries(KinesisProducer.java:849) > at > com.amazonaws.services.kinesis.producer.KinesisProducer.(KinesisProducer.java:243) > at > org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer.open(FlinkKinesisProducer.java:198) > at > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:112) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:376) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.SecurityException: The contents of the binary > /tmp/amazon-kinesis-producer-native-binaries/kinesis_producer_e9a87c761db92a73eb74519a4468ee71def87eb2 > is not what it's expected to be. > at > com.amazonaws.services.kinesis.producer.KinesisProducer.extractBinaries(KinesisProducer.java:822) > ... 8 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885456#comment-15885456 ] Robert Metzger commented on FLINK-5668: --- Sorry that I did not look at this JIRA earlier. [~bill.liu8904] and [~wheat9] if I understand you correctly, you want Flink on YARN not to use Hadoop's {{fs.defaultFS}} configuration for choosing the filesystem used to distribute jars and configuration files during deployment? This basically means that we need to provide a custom configuration key in Flink (something like {{yarn.deploy.fs}}) to put the stuff to. This would allow you so use s3 for deploying Flink and hdfs for rocksdb or other state backups. I'm not able to completely understand FLINK-5631: How can I register an additional jar from a different file system as a required resource? > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) passing taskmanager configuration through taskManagerEnv instead of file
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887703#comment-15887703 ] Robert Metzger commented on FLINK-5668: --- [~wheat9] I still did not understand how you can "inject" a custom path for some of the resources deployed by Flink on YARN. All the paths are programatically generated and there are no configuration parameters for passing custom paths (correct me if I'm wrong). Are you planning to basically fork Flink and create a custom YARN client / Application Master implementation that allows using custom paths? > regarding this issue, do you have an idea why the current implementation > writes the configuration into a file on default.FS? I think we didn't have your use case in mind when implementing the code. We assumed that one file system will be used for distributing all required files. Also, this approach works nicely will all the Hadoop vendor's versions. > What do you think if passing the configuration through the ``taskManagerEnv``? I can only assess this approach once I've understood what you are trying to achieve. > passing taskmanager configuration through taskManagerEnv instead of file > > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (FLINK-5598) Return jar name when jar is uploaded
[ https://issues.apache.org/jira/browse/FLINK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger reassigned FLINK-5598: - Assignee: Fabian Wollert > Return jar name when jar is uploaded > > > Key: FLINK-5598 > URL: https://issues.apache.org/jira/browse/FLINK-5598 > Project: Flink > Issue Type: Improvement > Components: Web Client >Reporter: Sendoh >Assignee: Fabian Wollert > > As as a Jenkins user who wants to uplaod jar through http call, I want jar > file name is returned after jar is uploaded. > Currently it returns nothing, as the code shown: > File newFile = new File(jarDir, UUID.randomUUID() + "_" + filename); > if (tempFile.renameTo(newFile)) { > // all went well > return "{}"; > } > Ref: > https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarUploadHandler.java#L58 > My proposal will be > reuturn {"fileName": newFile.getName()} > Any suggestion is welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5598) Return jar name when jar is uploaded
[ https://issues.apache.org/jira/browse/FLINK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894932#comment-15894932 ] Robert Metzger commented on FLINK-5598: --- Thanks a lot for fixing this issue. I assigned the JIRA to you (you can now assign JIRAs yourself, since you have "Contributor" permissions now). Note that I gave your @zalando.com account the permissions. > Return jar name when jar is uploaded > > > Key: FLINK-5598 > URL: https://issues.apache.org/jira/browse/FLINK-5598 > Project: Flink > Issue Type: Improvement > Components: Web Client >Reporter: Sendoh >Assignee: Fabian Wollert > > As as a Jenkins user who wants to uplaod jar through http call, I want jar > file name is returned after jar is uploaded. > Currently it returns nothing, as the code shown: > File newFile = new File(jarDir, UUID.randomUUID() + "_" + filename); > if (tempFile.renameTo(newFile)) { > // all went well > return "{}"; > } > Ref: > https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarUploadHandler.java#L58 > My proposal will be > reuturn {"fileName": newFile.getName()} > Any suggestion is welcome. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-4286) Have Kafka examples that use the Kafka 0.9 connector
[ https://issues.apache.org/jira/browse/FLINK-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894965#comment-15894965 ] Robert Metzger commented on FLINK-4286: --- I would vote to not have two examples that basically differ only in the Kafka version. We can just generally update the Kafka 0.8-based example to Kafka 0.9 > Have Kafka examples that use the Kafka 0.9 connector > > > Key: FLINK-4286 > URL: https://issues.apache.org/jira/browse/FLINK-4286 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Tzu-Li (Gordon) Tai >Assignee: Dmitrii Kniazev >Priority: Minor > Labels: starter > > The {{ReadFromKafka}} and {{WriteIntoKafka}} examples use the 0.8 connector, > and the built example jar is named {{Kafka.jar}} under > {{examples/streaming/}} in the distributed package. > Since we have different connectors for different Kafka versions, it would be > good to have examples for different versions, and package them as > {{Kafka08.jar}} and {{Kafka09.jar}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (FLINK-5067) Make Flink compile with 1.8 Java compiler
[ https://issues.apache.org/jira/browse/FLINK-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-5067. --- Resolution: Fixed Assignee: Andrey Melentyev Fix Version/s: 1.3.0 Resolved in master for 1.3 in http://git-wip-us.apache.org/repos/asf/flink/commit/dc00fb0c > Make Flink compile with 1.8 Java compiler > - > > Key: FLINK-5067 > URL: https://issues.apache.org/jira/browse/FLINK-5067 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.2.0 > Environment: macOS Sierra 10.12.1, java version "1.8.0_112", Apache > Maven 3.3.9 >Reporter: Andrey Melentyev >Assignee: Andrey Melentyev >Priority: Minor > Fix For: 1.3.0 > > > Flink fails to compile when using 1.8 as source and target in Maven. There > are two types of issue that are both related to the new type inference rules: > * Call to TypeSerializer.copy method in TupleSerializer.java:112 now resolves > to a different overload than before causing a compilation error: [ERROR] > /Users/andrey.melentyev/Dev/github.com/apache/flink/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/TupleSerializer.java:[112,63] > incompatible types: void cannot be converted to java.lang.Object > * A number of unit tests using assertEquals fail to compile: > [ERROR] > /Users/andrey.melentyev/Dev/github.com/apache/flink/flink-java/src/test/java/org/apache/flink/api/common/operators/CollectionExecutionAccumulatorsTest.java:[50,25] > reference to assertEquals is ambiguous > [ERROR] both method assertEquals(long,long) in org.junit.Assert and method > assertEquals(java.lang.Object,java.lang.Object) in org.junit.Assert match > In both of the above scenarios explicitly casting one of the arguments helps > the compiler to resolve overloaded method call correctly. > It is possible to maintain Flink's code base in a state when it can be built > by both 1.7 and 1.8. For this purpose we need minor code fixes and an > automated build in Travis to keep the new good state. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-2268) Provide Flink binary release without Hadoop
[ https://issues.apache.org/jira/browse/FLINK-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-2268: -- Description: Currently, all Flink releases ship with Hadoop 2.3.0 binaries. The big Hadoop distributions are usually not relying on vanilla Hadoop releases, but on custom patched versions. To provide the best user experience, we should offer a Flink binary that uses the Hadoop jars provided by the user (=hadoop distribution) was: Currently, all Flink releases ship with Hadoop 2.2.0 binaries. The big Hadoop distributions are usually not relying on vanilla Hadoop releases, but on custom patched versions. To provide the best user experience, we should offer a Flink binary that uses the Hadoop jars provided by the user (=hadoop distribution) > Provide Flink binary release without Hadoop > --- > > Key: FLINK-2268 > URL: https://issues.apache.org/jira/browse/FLINK-2268 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Robert Metzger > > Currently, all Flink releases ship with Hadoop 2.3.0 binaries. > The big Hadoop distributions are usually not relying on vanilla Hadoop > releases, but on custom patched versions. > To provide the best user experience, we should offer a Flink binary that uses > the Hadoop jars provided by the user (=hadoop distribution) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5998) Un-fat Hadoop from Flink fat jar
Robert Metzger created FLINK-5998: - Summary: Un-fat Hadoop from Flink fat jar Key: FLINK-5998 URL: https://issues.apache.org/jira/browse/FLINK-5998 Project: Flink Issue Type: Improvement Components: Build System Reporter: Robert Metzger As a first step towards FLINK-2268, I would suggest to put all hadoop dependencies into a jar separate from Flink's fat jar. This would allow users to put a custom Hadoop jar in there, or even deploy Flink without a Hadoop fat jar at all in environments where Hadoop is provided (EMR). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5379) Flink CliFrontend does not return when not logged in with kerberos
Robert Metzger created FLINK-5379: - Summary: Flink CliFrontend does not return when not logged in with kerberos Key: FLINK-5379 URL: https://issues.apache.org/jira/browse/FLINK-5379 Project: Flink Issue Type: Bug Components: Client Affects Versions: 1.2.0 Reporter: Robert Metzger In pre 1.2 versions, Flink immediately fails when trying to deploy it on YARN and the current user is not kerberos authenticated: {code} Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:384) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:591) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:465) Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: In secure mode. Please provide Kerberos credentials in order to authenticate. You may use kinit to authenticate and request a TGT from the Kerberos server. at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:371) ... 2 more {code} In 1.2, the following happens: {code} 2016-12-21 13:51:29,925 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at my-cluster-2wv1.c.sorter-757.internal/10.240.0.24:8032 2016-12-21 13:51:30,153 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:51:30,154 WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:51:30,154 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:00,171 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:00,172 WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:00,172 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:30,188 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:30,189 WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:30,189 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:53:00,203 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:53:00,204 WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the
[jira] [Updated] (FLINK-5379) Flink CliFrontend does not return when not logged in with kerberos
[ https://issues.apache.org/jira/browse/FLINK-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5379: -- Description: In pre 1.2 versions, Flink immediately fails when trying to deploy it on YARN and the current user is not kerberos authenticated: {code} Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:384) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:591) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:465) Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: In secure mode. Please provide Kerberos credentials in order to authenticate. You may use kinit to authenticate and request a TGT from the Kerberos server. at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:371) ... 2 more {code} In 1.2, the following happens (the CLI frontend does not return. It seems to be stuck in a loop) {code} 2016-12-21 13:51:29,925 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at my-cluster-2wv1.c.sorter-757.internal/10.240.0.24:8032 2016-12-21 13:51:30,153 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:51:30,154 WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:51:30,154 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:00,171 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:00,172 WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:00,172 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:30,188 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:30,189 WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:52:30,189 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:53:00,203 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:longrunning (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2016-12-21 13:53:00,204 WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism lev
[jira] [Created] (FLINK-5380) Number of outgoing records not reported in web interface
Robert Metzger created FLINK-5380: - Summary: Number of outgoing records not reported in web interface Key: FLINK-5380 URL: https://issues.apache.org/jira/browse/FLINK-5380 Project: Flink Issue Type: Bug Components: Webfrontend Affects Versions: 1.2.0 Reporter: Robert Metzger The web frontend does not report any outgoing records in the web frontend. The amount of data in MB is reported correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5380) Number of outgoing records not reported in web interface
[ https://issues.apache.org/jira/browse/FLINK-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5380: -- Attachment: outRecordsNotreported.png > Number of outgoing records not reported in web interface > > > Key: FLINK-5380 > URL: https://issues.apache.org/jira/browse/FLINK-5380 > Project: Flink > Issue Type: Bug > Components: Webfrontend >Affects Versions: 1.2.0 >Reporter: Robert Metzger > Attachments: outRecordsNotreported.png > > > The web frontend does not report any outgoing records in the web frontend. > The amount of data in MB is reported correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5381) Scrolling in some web interface pages doesn't work (taskmanager details, jobmanager config)
Robert Metzger created FLINK-5381: - Summary: Scrolling in some web interface pages doesn't work (taskmanager details, jobmanager config) Key: FLINK-5381 URL: https://issues.apache.org/jira/browse/FLINK-5381 Project: Flink Issue Type: Bug Affects Versions: 1.2.0 Reporter: Robert Metzger It seems that scrolling in the web interface doesn't work anymore on some pages in the 1.2 release branch. Example pages: - When you click the "JobManager" tab - The TaskManager logs page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5382) Taskmanager log download button causes 404
Robert Metzger created FLINK-5382: - Summary: Taskmanager log download button causes 404 Key: FLINK-5382 URL: https://issues.apache.org/jira/browse/FLINK-5382 Project: Flink Issue Type: Bug Components: Webfrontend Affects Versions: 1.2.0 Reporter: Robert Metzger The "download logs" button when viewing the TaskManager logs in the web UI leads to a 404 page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5383) TaskManager fails with SIGBUS when loading RocksDB
Robert Metzger created FLINK-5383: - Summary: TaskManager fails with SIGBUS when loading RocksDB Key: FLINK-5383 URL: https://issues.apache.org/jira/browse/FLINK-5383 Project: Flink Issue Type: Bug Affects Versions: 1.2.0 Reporter: Robert Metzger While trying out Flink 1.2, my TaskManager died with the following error while deploying a job: {code} 2016-12-21 15:57:50,080 INFO org.apache.flink.runtime.taskmanager.Task - Map -> Sink : Unnamed (15/16) (50f527e4445479fb1fc9f34394d86d2f) switched from DEPLOYING to RUNNING. 2016-12-21 15:57:50,081 INFO org.apache.flink.runtime.taskmanager.Task - Map -> Sink : Unnamed (16/16) (b4b3d3340de587d729fe83d65eac3e10) switched from DEPLOYING to RUNNING. 2016-12-21 15:57:50,081 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - Using user- defined state backend: RocksDB State Backend {isInitialized=false, configuredDbBasePaths=null, initialize dDbBasePaths=null, checkpointStreamBackend=File State Backend @ hdfs://nameservice1/shared/checkpoint-dir -rocks}. 2016-12-21 15:57:50,081 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - Using user- defined state backend: RocksDB State Backend {isInitialized=false, configuredDbBasePaths=null, initialize dDbBasePaths=null, checkpointStreamBackend=File State Backend @ hdfs://nameservice1/shared/checkpoint-dir -rocks}. 2016-12-21 15:57:50,223 INFO org.apache.flink.contrib.streaming.state.RocksDBStateBackend - Attempting to load RocksDB native library and store it at '/yarn/nm/usercache/longrunning/appcache/application_14821 56101125_0016' LogType:taskmanager.out Log Upload Time:Wed Dec 21 16:00:35 + 2016 LogLength:959 Log Contents: # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x7fe745fd596a, pid=7414, tid=140630801725184 # # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [ld-linux-x86-64.so.2+0x1a96a] realloc+0x2bfa # {code} the error report file contained the following frames: {code} Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j java.lang.ClassLoader$NativeLibrary.load(Ljava/lang/String;)V+0 j java.lang.ClassLoader.loadLibrary1(Ljava/lang/Class;Ljava/io/File;)Z+302 j java.lang.ClassLoader.loadLibrary0(Ljava/lang/Class;Ljava/io/File;)Z+2 j java.lang.ClassLoader.loadLibrary(Ljava/lang/Class;Ljava/lang/String;Z)V+48 j java.lang.Runtime.load0(Ljava/lang/Class;Ljava/lang/String;)V+57 j java.lang.System.load(Ljava/lang/String;)V+7 j org.rocksdb.NativeLibraryLoader.loadLibraryFromJar(Ljava/lang/String;)V+14 j org.rocksdb.NativeLibraryLoader.loadLibrary(Ljava/lang/String;)V+22 j org.apache.flink.contrib.streaming.state.RocksDBStateBackend.ensureRocksDBIsLoaded(Ljava/lang/String;)V+62 j org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(Lorg/apache/flink/runtime/execution/Environment;Lorg/apache/flink/api/common/JobID;Ljava/lang/String;Lorg/apache/flink/api/common/typeutils/TypeSerializer;ILorg/apache/flink/runtime/state/KeyGroupRange;Lorg/apache/flink/runtime/query/TaskKvStateRegistry;)Lorg/apache/flink/runtime/state/AbstractKeyedStateBackend;+16 j org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(Lorg/apache/flink/api/common/typeutils/TypeSerializer;ILorg/apache/flink/runtime/state/KeyGroupRange;)Lorg/apache/flink/runtime/state/AbstractKeyedStateBackend;+137 {code} I saw this error only once so far. I'll report again if it happens more frequently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5382) Taskmanager log download button causes 404
[ https://issues.apache.org/jira/browse/FLINK-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15778843#comment-15778843 ] Robert Metzger commented on FLINK-5382: --- Did you use Flink on YARN to reproduce the issue? > Taskmanager log download button causes 404 > -- > > Key: FLINK-5382 > URL: https://issues.apache.org/jira/browse/FLINK-5382 > Project: Flink > Issue Type: Bug > Components: Webfrontend >Affects Versions: 1.2.0 >Reporter: Robert Metzger >Assignee: Sachin Goel > > The "download logs" button when viewing the TaskManager logs in the web UI > leads to a 404 page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4861) Package optional project artifacts
[ https://issues.apache.org/jira/browse/FLINK-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-4861. --- Resolution: Fixed Resolved in https://github.com/apache/flink/commit/5c76baa1734303a01472afd17cfaf3442eb06c43 > Package optional project artifacts > -- > > Key: FLINK-4861 > URL: https://issues.apache.org/jira/browse/FLINK-4861 > Project: Flink > Issue Type: New Feature > Components: Build System >Affects Versions: 1.2.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.2.0 > > > Per the mailing list > [discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Additional-project-downloads-td13223.html], > package the Flink libraries and connectors into subdirectories of a new > {{opt}} directory in the release/snapshot tarballs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4391) Provide support for asynchronous operations over streams
[ https://issues.apache.org/jira/browse/FLINK-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4391: -- Fix Version/s: 1.2.0 > Provide support for asynchronous operations over streams > > > Key: FLINK-4391 > URL: https://issues.apache.org/jira/browse/FLINK-4391 > Project: Flink > Issue Type: New Feature > Components: DataStream API >Reporter: Jamie Grier >Assignee: david.wang > Fix For: 1.2.0 > > > Many Flink users need to do asynchronous processing driven by data from a > DataStream. The classic example would be joining against an external > database in order to enrich a stream with extra information. > It would be nice to add general support for this type of operation in the > Flink API. Ideally this could simply take the form of a new operator that > manages async operations, keeps so many of them in flight, and then emits > results to downstream operators as the async operations complete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4391) Provide support for asynchronous operations over streams
[ https://issues.apache.org/jira/browse/FLINK-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794719#comment-15794719 ] Robert Metzger commented on FLINK-4391: --- I could not find any documentation in f52830763d8f95a955c10265e2c3543a5890e719. What's the plan for documenting the feature? > Provide support for asynchronous operations over streams > > > Key: FLINK-4391 > URL: https://issues.apache.org/jira/browse/FLINK-4391 > Project: Flink > Issue Type: New Feature > Components: DataStream API >Reporter: Jamie Grier >Assignee: david.wang > Fix For: 1.2.0 > > > Many Flink users need to do asynchronous processing driven by data from a > DataStream. The classic example would be joining against an external > database in order to enrich a stream with extra information. > It would be nice to add general support for this type of operation in the > Flink API. Ideally this could simply take the form of a new operator that > manages async operations, keeps so many of them in flight, and then emits > results to downstream operators as the async operations complete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5264) Improve error message when assigning uid to intermediate operator
[ https://issues.apache.org/jira/browse/FLINK-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5264: -- Component/s: Java API > Improve error message when assigning uid to intermediate operator > - > > Key: FLINK-5264 > URL: https://issues.apache.org/jira/browse/FLINK-5264 > Project: Flink > Issue Type: Bug > Components: Java API >Affects Versions: 1.1.3 >Reporter: Kostas Kloudas >Assignee: Ufuk Celebi > Fix For: 1.2.0 > > > Currently when trying to assign uid to an intermediate operator in a chain, > the error message just says that it is not the right place, without any > further information. > We could improve the message by telling explicitly to the user on which > operator to assign the uid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3710) ScalaDocs for org.apache.flink.streaming.scala are missing from the web site
[ https://issues.apache.org/jira/browse/FLINK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798948#comment-15798948 ] Robert Metzger commented on FLINK-3710: --- I agree that the current situation is not acceptable. I did a quick search on the web, but I also could not find a good way of generating aggregated scaladocs. Apache Spark uses SBT: https://issues.apache.org/jira/browse/SPARK-1439. I'll remove the links from our website for now. The only solution I see is having separate scaladocs links for the batch and streaming API. But I guess that they also have some code in common which will not be cross-referenced then. > ScalaDocs for org.apache.flink.streaming.scala are missing from the web site > > > Key: FLINK-3710 > URL: https://issues.apache.org/jira/browse/FLINK-3710 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.0.1 >Reporter: Elias Levy > Fix For: 1.0.4 > > > The ScalaDocs only include docs for org.apache.flink.scala and sub-packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3710) ScalaDocs for org.apache.flink.streaming.scala are missing from the web site
[ https://issues.apache.org/jira/browse/FLINK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798959#comment-15798959 ] Robert Metzger commented on FLINK-3710: --- I removed the links in this commit: http://git-wip-us.apache.org/repos/asf/flink-web/commit/4d8a7e26 > ScalaDocs for org.apache.flink.streaming.scala are missing from the web site > > > Key: FLINK-3710 > URL: https://issues.apache.org/jira/browse/FLINK-3710 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.0.1 >Reporter: Elias Levy > Fix For: 1.0.4 > > > The ScalaDocs only include docs for org.apache.flink.scala and sub-packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-5382) Taskmanager log download button causes 404
[ https://issues.apache.org/jira/browse/FLINK-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-5382. --- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Resolved for 1.3 (master) in http://git-wip-us.apache.org/repos/asf/flink/commit/29eec70d and 1.2 in http://git-wip-us.apache.org/repos/asf/flink/commit/a6a5 > Taskmanager log download button causes 404 > -- > > Key: FLINK-5382 > URL: https://issues.apache.org/jira/browse/FLINK-5382 > Project: Flink > Issue Type: Bug > Components: Webfrontend >Affects Versions: 1.2.0 >Reporter: Robert Metzger >Assignee: Sachin Goel > Fix For: 1.2.0, 1.3.0 > > > The "download logs" button when viewing the TaskManager logs in the web UI > leads to a 404 page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-5383) TaskManager fails with SIGBUS when loading RocksDB
[ https://issues.apache.org/jira/browse/FLINK-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-5383. --- Resolution: Fixed Assignee: Stephan Ewen Fix Version/s: 1.3.0 > TaskManager fails with SIGBUS when loading RocksDB > -- > > Key: FLINK-5383 > URL: https://issues.apache.org/jira/browse/FLINK-5383 > Project: Flink > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Robert Metzger >Assignee: Stephan Ewen > Fix For: 1.3.0 > > > While trying out Flink 1.2, my TaskManager died with the following error > while deploying a job: > {code} > 2016-12-21 15:57:50,080 INFO org.apache.flink.runtime.taskmanager.Task > - Map -> Sink > : Unnamed (15/16) (50f527e4445479fb1fc9f34394d86d2f) switched from DEPLOYING > to RUNNING. > 2016-12-21 15:57:50,081 INFO org.apache.flink.runtime.taskmanager.Task > - Map -> Sink > : Unnamed (16/16) (b4b3d3340de587d729fe83d65eac3e10) switched from DEPLOYING > to RUNNING. > 2016-12-21 15:57:50,081 INFO > org.apache.flink.streaming.runtime.tasks.StreamTask - Using user- > defined state backend: RocksDB State Backend {isInitialized=false, > configuredDbBasePaths=null, initialize > dDbBasePaths=null, checkpointStreamBackend=File State Backend @ > hdfs://nameservice1/shared/checkpoint-dir > -rocks}. > 2016-12-21 15:57:50,081 INFO > org.apache.flink.streaming.runtime.tasks.StreamTask - Using user- > defined state backend: RocksDB State Backend {isInitialized=false, > configuredDbBasePaths=null, initialize > dDbBasePaths=null, checkpointStreamBackend=File State Backend @ > hdfs://nameservice1/shared/checkpoint-dir > -rocks}. > 2016-12-21 15:57:50,223 INFO > org.apache.flink.contrib.streaming.state.RocksDBStateBackend - Attempting > to load RocksDB native library and store it at > '/yarn/nm/usercache/longrunning/appcache/application_14821 > 56101125_0016' > LogType:taskmanager.out > Log Upload Time:Wed Dec 21 16:00:35 + 2016 > LogLength:959 > Log Contents: > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0x7) at pc=0x7fe745fd596a, pid=7414, tid=140630801725184 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build > 1.7.0_67-b01) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [ld-linux-x86-64.so.2+0x1a96a] realloc+0x2bfa > # > {code} > the error report file contained the following frames: > {code} > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > j java.lang.ClassLoader$NativeLibrary.load(Ljava/lang/String;)V+0 > j java.lang.ClassLoader.loadLibrary1(Ljava/lang/Class;Ljava/io/File;)Z+302 > j java.lang.ClassLoader.loadLibrary0(Ljava/lang/Class;Ljava/io/File;)Z+2 > j java.lang.ClassLoader.loadLibrary(Ljava/lang/Class;Ljava/lang/String;Z)V+48 > j java.lang.Runtime.load0(Ljava/lang/Class;Ljava/lang/String;)V+57 > j java.lang.System.load(Ljava/lang/String;)V+7 > j org.rocksdb.NativeLibraryLoader.loadLibraryFromJar(Ljava/lang/String;)V+14 > j org.rocksdb.NativeLibraryLoader.loadLibrary(Ljava/lang/String;)V+22 > j > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.ensureRocksDBIsLoaded(Ljava/lang/String;)V+62 > j > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(Lorg/apache/flink/runtime/execution/Environment;Lorg/apache/flink/api/common/JobID;Ljava/lang/String;Lorg/apache/flink/api/common/typeutils/TypeSerializer;ILorg/apache/flink/runtime/state/KeyGroupRange;Lorg/apache/flink/runtime/query/TaskKvStateRegistry;)Lorg/apache/flink/runtime/state/AbstractKeyedStateBackend;+16 > j > org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(Lorg/apache/flink/api/common/typeutils/TypeSerializer;ILorg/apache/flink/runtime/state/KeyGroupRange;)Lorg/apache/flink/runtime/state/AbstractKeyedStateBackend;+137 > {code} > I saw this error only once so far. I'll report again if it happens more > frequently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5383) TaskManager fails with SIGBUS when loading RocksDB
[ https://issues.apache.org/jira/browse/FLINK-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811424#comment-15811424 ] Robert Metzger commented on FLINK-5383: --- I can confirm that the mentioned commit is fixing the issue! I'm closing the JIRA. > TaskManager fails with SIGBUS when loading RocksDB > -- > > Key: FLINK-5383 > URL: https://issues.apache.org/jira/browse/FLINK-5383 > Project: Flink > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Robert Metzger > Fix For: 1.3.0 > > > While trying out Flink 1.2, my TaskManager died with the following error > while deploying a job: > {code} > 2016-12-21 15:57:50,080 INFO org.apache.flink.runtime.taskmanager.Task > - Map -> Sink > : Unnamed (15/16) (50f527e4445479fb1fc9f34394d86d2f) switched from DEPLOYING > to RUNNING. > 2016-12-21 15:57:50,081 INFO org.apache.flink.runtime.taskmanager.Task > - Map -> Sink > : Unnamed (16/16) (b4b3d3340de587d729fe83d65eac3e10) switched from DEPLOYING > to RUNNING. > 2016-12-21 15:57:50,081 INFO > org.apache.flink.streaming.runtime.tasks.StreamTask - Using user- > defined state backend: RocksDB State Backend {isInitialized=false, > configuredDbBasePaths=null, initialize > dDbBasePaths=null, checkpointStreamBackend=File State Backend @ > hdfs://nameservice1/shared/checkpoint-dir > -rocks}. > 2016-12-21 15:57:50,081 INFO > org.apache.flink.streaming.runtime.tasks.StreamTask - Using user- > defined state backend: RocksDB State Backend {isInitialized=false, > configuredDbBasePaths=null, initialize > dDbBasePaths=null, checkpointStreamBackend=File State Backend @ > hdfs://nameservice1/shared/checkpoint-dir > -rocks}. > 2016-12-21 15:57:50,223 INFO > org.apache.flink.contrib.streaming.state.RocksDBStateBackend - Attempting > to load RocksDB native library and store it at > '/yarn/nm/usercache/longrunning/appcache/application_14821 > 56101125_0016' > LogType:taskmanager.out > Log Upload Time:Wed Dec 21 16:00:35 + 2016 > LogLength:959 > Log Contents: > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGBUS (0x7) at pc=0x7fe745fd596a, pid=7414, tid=140630801725184 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build > 1.7.0_67-b01) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [ld-linux-x86-64.so.2+0x1a96a] realloc+0x2bfa > # > {code} > the error report file contained the following frames: > {code} > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > j java.lang.ClassLoader$NativeLibrary.load(Ljava/lang/String;)V+0 > j java.lang.ClassLoader.loadLibrary1(Ljava/lang/Class;Ljava/io/File;)Z+302 > j java.lang.ClassLoader.loadLibrary0(Ljava/lang/Class;Ljava/io/File;)Z+2 > j java.lang.ClassLoader.loadLibrary(Ljava/lang/Class;Ljava/lang/String;Z)V+48 > j java.lang.Runtime.load0(Ljava/lang/Class;Ljava/lang/String;)V+57 > j java.lang.System.load(Ljava/lang/String;)V+7 > j org.rocksdb.NativeLibraryLoader.loadLibraryFromJar(Ljava/lang/String;)V+14 > j org.rocksdb.NativeLibraryLoader.loadLibrary(Ljava/lang/String;)V+22 > j > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.ensureRocksDBIsLoaded(Ljava/lang/String;)V+62 > j > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(Lorg/apache/flink/runtime/execution/Environment;Lorg/apache/flink/api/common/JobID;Ljava/lang/String;Lorg/apache/flink/api/common/typeutils/TypeSerializer;ILorg/apache/flink/runtime/state/KeyGroupRange;Lorg/apache/flink/runtime/query/TaskKvStateRegistry;)Lorg/apache/flink/runtime/state/AbstractKeyedStateBackend;+16 > j > org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(Lorg/apache/flink/api/common/typeutils/TypeSerializer;ILorg/apache/flink/runtime/state/KeyGroupRange;)Lorg/apache/flink/runtime/state/AbstractKeyedStateBackend;+137 > {code} > I saw this error only once so far. I'll report again if it happens more > frequently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4973) Flakey Yarn tests due to recently added latency marker
[ https://issues.apache.org/jira/browse/FLINK-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814982#comment-15814982 ] Robert Metzger commented on FLINK-4973: --- Any objections for logging only the exception message in the {{LatencyMarksEmitter}} ? Currently, we log the full stack trace, which pollutes the logs. > Flakey Yarn tests due to recently added latency marker > -- > > Key: FLINK-4973 > URL: https://issues.apache.org/jira/browse/FLINK-4973 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.2.0 > > > The newly introduced {{LatencyMarksEmitter}} emits latency marker on the > {{Output}}. This can still happen after the underlying {{BufferPool}} has > been destroyed. The occurring exception is then logged: > {code} > 2016-10-29 15:00:48,088 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Custom File Source (1/1) switched to FINISHED > 2016-10-29 15:00:48,089 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Source: Custom File Source (1/1) > 2016-10-29 15:00:48,089 INFO org.apache.flink.yarn.YarnTaskManager > - Un-registering task and sending final execution state > FINISHED to JobManager for task Source: Custom File Source > (8fe0f817fa6d960ea33f6e57e0c3891c) > 2016-10-29 15:00:48,101 WARN > org.apache.flink.streaming.api.operators.AbstractStreamOperator - Error > while emitting latency marker > java.lang.RuntimeException: Buffer pool is destroyed. > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:99) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.emitLatencyMarker(AbstractStreamOperator.java:734) > at > org.apache.flink.streaming.api.operators.StreamSource$LatencyMarksEmitter$1.run(StreamSource.java:134) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Buffer pool is destroyed. > at > org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:144) > at > org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:118) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.randomEmit(RecordWriter.java:103) > at > org.apache.flink.streaming.runtime.io.StreamRecordWriter.randomEmit(StreamRecordWriter.java:104) > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:96) > ... 9 more > {code} > This exception is clearly related to the shutdown of a stream operator and > does not indicate a wrong behaviour. Since the yarn tests simply scan the log > for some keywords (including exception) such a case can make them fail. > Best if we could make sure that the {{LatencyMarksEmitter}} would only emit > latency marker if the {{Output}} would still be active. But we could also > simply not log exceptions which occurred after the stream operator has been > stopped. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/171578846/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5297) Relocate Flink's Hadoop dependency and its transitive dependencies
[ https://issues.apache.org/jira/browse/FLINK-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815050#comment-15815050 ] Robert Metzger commented on FLINK-5297: --- I've created a branch that shades ALL hadoop dependencies (I've only validated this with Hadoop 2.3.0 and 2.7.3). https://github.com/rmetzger/flink/tree/flink5297 The YARN tests are NOT working, because some web servers of YARN are not properly starting. The problem is that jetty is dynamically loading some servlet classes from a configuration file. The config file is not rewritten, that's why jetty can not find the classes. I've manually tested the branch on a YARN cluster and it works. > Relocate Flink's Hadoop dependency and its transitive dependencies > -- > > Key: FLINK-5297 > URL: https://issues.apache.org/jira/browse/FLINK-5297 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.2.0 >Reporter: Till Rohrmann > Fix For: 1.2.0 > > > A user reported that they have a dependency conflict with one of Hadoop's > dependencies. More concretely it is the {{aws-java-sdk-*}} dependency which > is not backward compatible. The user is dependent on a newer {{aws-java-sdk}} > version which cannot be used by Hadoop version 2.7. > A solution for future dependency conflicts could be to relocate Hadoop's > dependencies or even all of the Hadoop dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4973) Flakey Yarn tests due to recently added latency marker
[ https://issues.apache.org/jira/browse/FLINK-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815053#comment-15815053 ] Robert Metzger commented on FLINK-4973: --- The stack trace is only logged in the LatencyMarksEmitter. But my job is failing frequently. In a 24 hrs run, I got over 8000 of these stack traces in my log. > Flakey Yarn tests due to recently added latency marker > -- > > Key: FLINK-4973 > URL: https://issues.apache.org/jira/browse/FLINK-4973 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.2.0 > > > The newly introduced {{LatencyMarksEmitter}} emits latency marker on the > {{Output}}. This can still happen after the underlying {{BufferPool}} has > been destroyed. The occurring exception is then logged: > {code} > 2016-10-29 15:00:48,088 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Custom File Source (1/1) switched to FINISHED > 2016-10-29 15:00:48,089 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Source: Custom File Source (1/1) > 2016-10-29 15:00:48,089 INFO org.apache.flink.yarn.YarnTaskManager > - Un-registering task and sending final execution state > FINISHED to JobManager for task Source: Custom File Source > (8fe0f817fa6d960ea33f6e57e0c3891c) > 2016-10-29 15:00:48,101 WARN > org.apache.flink.streaming.api.operators.AbstractStreamOperator - Error > while emitting latency marker > java.lang.RuntimeException: Buffer pool is destroyed. > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:99) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.emitLatencyMarker(AbstractStreamOperator.java:734) > at > org.apache.flink.streaming.api.operators.StreamSource$LatencyMarksEmitter$1.run(StreamSource.java:134) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: Buffer pool is destroyed. > at > org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:144) > at > org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:118) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.randomEmit(RecordWriter.java:103) > at > org.apache.flink.streaming.runtime.io.StreamRecordWriter.randomEmit(StreamRecordWriter.java:104) > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:96) > ... 9 more > {code} > This exception is clearly related to the shutdown of a stream operator and > does not indicate a wrong behaviour. Since the yarn tests simply scan the log > for some keywords (including exception) such a case can make them fail. > Best if we could make sure that the {{LatencyMarksEmitter}} would only emit > latency marker if the {{Output}} would still be active. But we could also > simply not log exceptions which occurred after the stream operator has been > stopped. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/171578846/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5450) WindowOperator logs about "re-registering state from an older Flink version" even though its not a restored window
Robert Metzger created FLINK-5450: - Summary: WindowOperator logs about "re-registering state from an older Flink version" even though its not a restored window Key: FLINK-5450 URL: https://issues.apache.org/jira/browse/FLINK-5450 Project: Flink Issue Type: Bug Components: Windowing Operators Affects Versions: 1.2.0 Reporter: Robert Metzger While testing the RC0 of Flink 1.2, I stumbled across this log message {code} 15:42:02,855 INFO org.apache.flink.streaming.api.operators.AbstractStreamOperator - WindowOperator (taskIdx=WindowOperator) re-registering state from an older Flink version. {code} My WindowOperator is not restored, so I find this log message a bit misleading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4973) Flakey Yarn tests due to recently added latency marker
[ https://issues.apache.org/jira/browse/FLINK-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819056#comment-15819056 ] Robert Metzger commented on FLINK-4973: --- {code} 2017-01-10 04:59:22,578 INFO org.apache.flink.runtime.taskmanager.Task - Task Source: sale event generator (4/20) is already in state CANCELING 2017-01-10 04:59:22,578 INFO org.apache.flink.runtime.taskmanager.Task - Triggering cancellation of task code Sink: control events sink (4/20) (85881281bb76d124a85b50156f59b0fd). 2017-01-10 04:59:22,579 WARN org.apache.flink.streaming.api.operators.AbstractStreamOperator - Error while emitting latency marker. java.lang.RuntimeException: Buffer pool is destroyed. at org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:99) at org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.emitLatencyMarker(OperatorChain.java:445) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.emitLatencyMarker(AbstractStreamOperator.java:743) at org.apache.flink.streaming.api.operators.StreamSource$LatencyMarksEmitter$1.onProcessingTime(StreamSource.java:142) at org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$RepeatedTriggerTask.run(SystemProcessingTimeService.java:256) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Buffer pool is destroyed. at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:149) at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:138) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:126) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.randomEmit(RecordWriter.java:102) at org.apache.flink.streaming.runtime.io.StreamRecordWriter.randomEmit(StreamRecordWriter.java:104) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:96) ... 11 more 2017-01-10 04:59:22,581 WARN org.apache.flink.streaming.api.operators.AbstractStreamOperator - Error while emitting latency marker. java.lang.RuntimeException: Buffer pool is destroyed. at org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitLatencyMarker(RecordWriterOutput.java:99) at org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.emitLatencyMarker(OperatorChain.java:445) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.emitLatencyMarker(AbstractStreamOperator.java:743) at org.apache.flink.streaming.api.operators.StreamSource$LatencyMarksEmitter$1.onProcessingTime(StreamSource.java:142) at org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$RepeatedTriggerTask.run(SystemProcessingTimeService.java:256) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Buffer pool is destroyed. at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:149) at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:138) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:126) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.randomEmit(RecordWriter.java:102) at org.apache.flink.streaming.runtime.io.StreamRecordWriter.randomEmit(StreamRecordW
[jira] [Created] (FLINK-5462) Flink job fails due to java.util.concurrent.CancellationException while snapshotting
Robert Metzger created FLINK-5462: - Summary: Flink job fails due to java.util.concurrent.CancellationException while snapshotting Key: FLINK-5462 URL: https://issues.apache.org/jira/browse/FLINK-5462 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Affects Versions: 1.2.0 Reporter: Robert Metzger I'm using Flink 699f4b0. My restored, rescaled Flink job failed while creating a checkpoint with the following exception: {code} 2017-01-11 18:46:49,853 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 3 @ 1484160409846 2017-01-11 18:49:50,111 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream .apply(AllWindowedStream.java:440)) (1/1) (2accc6ca2727c4f7ec963318fbd237e9) switched from RUNNING to FAILED. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 3 for operator TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream.ap ply(AllWindowedStream.java:440)) (1/1).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 3 for operator TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream.apply(AllWind owedStream.java:440)) (1/1). ... 6 more Caused by: java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:899) ... 5 more 2017-01-11 18:49:50,113 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Generate Event Window stream (90859d392c1da472e07695f434b332ef) switched from state RUNNING to FAILING. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 3 for operator TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream.ap ply(AllWindowedStream.java:440)) (1/1).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 3 for operator TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1). ... 6 more Caused by: java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:899) ... 5 more 2017-01-11 18:49:50,122 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Timestamps/Watermarks (1/2) (e52c1211b5693552f5908b0082c80882) switched from RUNNING to CANCELING. {code} There are no other logged around that time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5463) RocksDB.disposeInternal does not react to interrupts, blocks task cancellation
Robert Metzger created FLINK-5463: - Summary: RocksDB.disposeInternal does not react to interrupts, blocks task cancellation Key: FLINK-5463 URL: https://issues.apache.org/jira/browse/FLINK-5463 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Affects Versions: 1.2.0 Reporter: Robert Metzger I'm using Flink 699f4b0. My Flink job is slow while cancelling because RockDB seems to be busy with disposing its state: {code} 2017-01-11 18:48:23,315 INFO org.apache.flink.runtime.taskmanager.Task - Triggering cancellation of task code TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071 }, EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1) (2accc6ca2727c4f7ec963318fbd237e9). 2017-01-11 18:48:53,318 WARN org.apache.flink.runtime.taskmanager.Task - Task 'TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), Windowed Stream.apply(AllWindowedStream.java:440)) (1/1)' did not react to cancelling signal, but is stuck in method: org.rocksdb.RocksDB.disposeInternal(Native Method) org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37) org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:56) org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:250) org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:331) org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:169) org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.dispose(WindowOperator.java:273) org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:439) org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:340) org.apache.flink.runtime.taskmanager.Task.run(Task.java:654) java.lang.Thread.run(Thread.java:745) 2017-01-11 18:48:53,319 WARN org.apache.flink.runtime.taskmanager.Task - Task 'TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1)' did not react to cancelling signal, but is stuck in method: org.rocksdb.RocksDB.disposeInternal(Native Method) org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37) org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:56) org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:250) org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:331) org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:169) org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.dispose(WindowOperator.java:273) org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:439) org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:340) org.apache.flink.runtime.taskmanager.Task.run(Task.java:654) java.lang.Thread.run(Thread.java:745) 2017-01-11 18:49:23,319 WARN org.apache.flink.runtime.taskmanager.Task - Task 'TriggerWindow(TumblingEventTimeWindows(4), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1)' did not react to cancelling signal, but is stuck in method: org.rocksdb.RocksDB.disposeInternal(Native Method) org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37) org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:56) org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:250) org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:331) org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:169) org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.dispose(WindowOperator.java:273) org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:439) org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:340) org.apache.flink.runtime.taskmanager.Task.run(Task.java:654) java.lang.Thread.run(Thread.java:745) 2017-01-11 18:49:50,080 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for TriggerWindow(Tumbli
[jira] [Created] (FLINK-5464) MetricQueryService throws NullPointerException on JobManager
Robert Metzger created FLINK-5464: - Summary: MetricQueryService throws NullPointerException on JobManager Key: FLINK-5464 URL: https://issues.apache.org/jira/browse/FLINK-5464 Project: Flink Issue Type: Bug Components: Webfrontend Affects Versions: 1.2.0 Reporter: Robert Metzger I'm using Flink 699f4b0. My JobManager log contains many of these log entries: {code} 2017-01-11 19:42:05,778 WARN org.apache.flink.runtime.webmonitor.metrics.MetricFetcher - Fetching metrics failed. akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/MetricQueryService#-970662317]] after [1 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:474) at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:425) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:429) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:381) at java.lang.Thread.run(Thread.java:745) 2017-01-11 19:42:07,765 WARN org.apache.flink.runtime.metrics.dump.MetricQueryService - An exception occurred while processing a message. java.lang.NullPointerException at org.apache.flink.runtime.metrics.dump.MetricDumpSerialization.serializeGauge(MetricDumpSerialization.java:162) at org.apache.flink.runtime.metrics.dump.MetricDumpSerialization.access$300(MetricDumpSerialization.java:47) at org.apache.flink.runtime.metrics.dump.MetricDumpSerialization$MetricDumpSerializer.serialize(MetricDumpSerialization.java:90) at org.apache.flink.runtime.metrics.dump.MetricQueryService.onReceive(MetricQueryService.java:109) at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5465) RocksDB fails with segfault while calling AbstractRocksDBState.clear()
Robert Metzger created FLINK-5465: - Summary: RocksDB fails with segfault while calling AbstractRocksDBState.clear() Key: FLINK-5465 URL: https://issues.apache.org/jira/browse/FLINK-5465 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Affects Versions: 1.2.0 Reporter: Robert Metzger I'm using Flink 699f4b0. {code} # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f91a0d49b78, pid=26662, tid=140263356024576 # # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [librocksdbjni-linux64.so+0x1aeb78] rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /yarn/nm/usercache/robert/appcache/application_1484132267957_0007/container_1484132267957_0007_01_10/hs_err_pid26662.log Compiled method (nm) 1869778 903 n org.rocksdb.RocksDB::remove (native) total in heap [0x7f91b40b9dd0,0x7f91b40ba150] = 896 relocation [0x7f91b40b9ef0,0x7f91b40b9f48] = 88 main code [0x7f91b40b9f60,0x7f91b40ba150] = 496 # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5465) RocksDB fails with segfault while calling AbstractRocksDBState.clear()
[ https://issues.apache.org/jira/browse/FLINK-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5465: -- Attachment: hs-err-pid26662.log I've attached the error report file, containing the stack trace {code} Stack: [0x7f919b72c000,0x7f919b82d000], sp=0x7f919b82b368, free space=1020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [librocksdbjni-linux64.so+0x1aeb78] rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8 C [librocksdbjni-linux64.so+0x2009e1] rocksdb::DB::Delete(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&)+0x41 C [librocksdbjni-linux64.so+0x200a41] rocksdb::DBImpl::Delete(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&)+0x11 C [librocksdbjni-linux64.so+0x18e993] rocksdb_remove_helper(JNIEnv_*, rocksdb::DB*, rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle*, _jbyteArray*, int)+0x63 C [librocksdbjni-linux64.so+0x18eee6] Java_org_rocksdb_RocksDB_remove__JJ_3BIJ+0x26 J 903 org.rocksdb.RocksDB.remove(JJ[BIJ)V (0 bytes) @ 0x7f91b40ba02d [0x7f91b40b9f60+0xcd] Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) J 903 org.rocksdb.RocksDB.remove(JJ[BIJ)V (0 bytes) @ 0x7f91b40b9fb3 [0x7f91b40b9f60+0x53] J 900 C2 org.apache.flink.contrib.streaming.state.AbstractRocksDBState.clear()V (47 bytes) @ 0x7f91b4143f18 [0x7f91b4143c60+0x2b8] J 1087 C2 org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.cleanup(Lorg/apache/flink/streaming/api/windowing/windows/Window;Lorg/apache/flink/api/common/state/AppendingState;Lorg/apache/flink/streaming/runtime/operators/windowing/MergingWindowSet;)V (27 bytes) @ 0x7f91b414dc64 [0x7f91b414dc20+0x44] J 925 C2 org.apache.flink.streaming.api.operators.HeapInternalTimerService.onProcessingTime(J)V (107 bytes) @ 0x7f91b4194794 [0x7f91b4194560+0x234] j org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run()V+15 j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4 j java.util.concurrent.FutureTask.run()V+42 j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1 j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+30 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub {code} > RocksDB fails with segfault while calling AbstractRocksDBState.clear() > -- > > Key: FLINK-5465 > URL: https://issues.apache.org/jira/browse/FLINK-5465 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Robert Metzger > Attachments: hs-err-pid26662.log > > > I'm using Flink 699f4b0. > {code} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f91a0d49b78, pid=26662, tid=140263356024576 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build > 1.7.0_67-b01) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [librocksdbjni-linux64.so+0x1aeb78] > rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # > /yarn/nm/usercache/robert/appcache/application_1484132267957_0007/container_1484132267957_0007_01_10/hs_err_pid26662.log > Compiled method (nm) 1869778 903 n org.rocksdb.RocksDB::remove > (native) > total in heap [0x7f91b40b9dd0,0x7f91b40ba150] = 896 > relocation [0x7f91b40b9ef0,0x7f91b40b9f48] = 88 > main code [0x7f91b40b9f60,0x7f91b40ba150] = 496 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.sun.com/bugreport/crash.jsp > # The crash happened outside the Java Virtual Machine in native code. > # See problematic frame for where to report the bug. > # > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5462) Flink job fails due to java.util.concurrent.CancellationException while snapshotting
[ https://issues.apache.org/jira/browse/FLINK-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5462: -- Attachment: application-1484132267957-0005 I've attached the full log. > Flink job fails due to java.util.concurrent.CancellationException while > snapshotting > > > Key: FLINK-5462 > URL: https://issues.apache.org/jira/browse/FLINK-5462 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Robert Metzger > Attachments: application-1484132267957-0005 > > > I'm using Flink 699f4b0. > My restored, rescaled Flink job failed while creating a checkpoint with the > following exception: > {code} > 2017-01-11 18:46:49,853 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering > checkpoint 3 @ 1484160409846 > 2017-01-11 18:49:50,111 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- > TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream > .apply(AllWindowedStream.java:440)) (1/1) (2accc6ca2727c4f7ec963318fbd237e9) > switched from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 3 > for operator TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream.ap > ply(AllWindowedStream.java:440)) (1/1).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 3 for > operator TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream.apply(AllWind > owedStream.java:440)) (1/1). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:899) > ... 5 more > 2017-01-11 18:49:50,113 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Generate > Event Window stream (90859d392c1da472e07695f434b332ef) switched from state > RUNNING to FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 3 > for operator TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream.ap > ply(AllWindowedStream.java:440)) (1/1).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 3 for > operator TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:899) > ... 5 more > 2017-01-11 18:49:50,122 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph
[jira] [Created] (FLINK-5468) Restoring from a semi async rocksdb statebackend (1.1) to 1.2 fails with ClassNotFoundException
Robert Metzger created FLINK-5468: - Summary: Restoring from a semi async rocksdb statebackend (1.1) to 1.2 fails with ClassNotFoundException Key: FLINK-5468 URL: https://issues.apache.org/jira/browse/FLINK-5468 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Affects Versions: 1.2.0 Reporter: Robert Metzger I think we should catch this exception and explain what's going on and how users can resolve the issue. {code} org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) at com.dataartisans.eventwindow.Generator.main(Generator.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073) at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120) at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116) Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed at org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:328) at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:382) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:423) ... 22 more Caused by: java.io.IOException: java.lang.ClassNotFoundException: org.apache.flink.contrib.streaming.state.RocksDBStateBackend$FinalSemiAsyncSnapshot at org.apache.flink.migration.runtime.checkpoint.savepoint.SavepointV0Serializer.deserialize(SavepointV0Serializer.java:162) at org.apache.flink.migration.runtime.checkpoint.savepoint.SavepointV0Serializer.deserialize(SavepointV0Serializer.java:70) at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:138) at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.ClassNotFoundException: org.apache.flink.contrib.streamin
[jira] [Created] (FLINK-5473) setMaxParallelism() higher than 1 is possible on non-parallel operators
Robert Metzger created FLINK-5473: - Summary: setMaxParallelism() higher than 1 is possible on non-parallel operators Key: FLINK-5473 URL: https://issues.apache.org/jira/browse/FLINK-5473 Project: Flink Issue Type: Bug Components: DataStream API Affects Versions: 1.2.0 Reporter: Robert Metzger While trying out Flink 1.2, I found out that you can set a maxParallelism higher than 1 on a non-parallel operator. I think we should have the same semantics as the setParallelism() method. Also, when setting a global maxParallelism in the execution environment, it will be set as a default value for the non-parallel operator. When restoring a savepoint from 1.1, you have to set the maxParallelism to the parallelism of the 1.1 job. Non-parallel operators will then also get the maxPar set to this value, leading to an error on restore. So currently, users restoring from 1.1 to 1.2 have to manually set the maxParallelism to 1 for all non-parallel operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5473) setMaxParallelism() higher than 1 is possible on non-parallel operators
[ https://issues.apache.org/jira/browse/FLINK-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821262#comment-15821262 ] Robert Metzger commented on FLINK-5473: --- That would be the best user experience, yes. > setMaxParallelism() higher than 1 is possible on non-parallel operators > --- > > Key: FLINK-5473 > URL: https://issues.apache.org/jira/browse/FLINK-5473 > Project: Flink > Issue Type: Bug > Components: DataStream API >Affects Versions: 1.2.0 >Reporter: Robert Metzger > > While trying out Flink 1.2, I found out that you can set a maxParallelism > higher than 1 on a non-parallel operator. > I think we should have the same semantics as the setParallelism() method. > Also, when setting a global maxParallelism in the execution environment, it > will be set as a default value for the non-parallel operator. > When restoring a savepoint from 1.1, you have to set the maxParallelism to > the parallelism of the 1.1 job. Non-parallel operators will then also get the > maxPar set to this value, leading to an error on restore. > So currently, users restoring from 1.1 to 1.2 have to manually set the > maxParallelism to 1 for all non-parallel operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-5465) RocksDB fails with segfault while calling AbstractRocksDBState.clear()
[ https://issues.apache.org/jira/browse/FLINK-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-5465. --- Resolution: Won't Fix Thank you guys for looking into it. I'm closing the issue as won't fix. > RocksDB fails with segfault while calling AbstractRocksDBState.clear() > -- > > Key: FLINK-5465 > URL: https://issues.apache.org/jira/browse/FLINK-5465 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Robert Metzger > Attachments: hs-err-pid26662.log > > > I'm using Flink 699f4b0. > {code} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f91a0d49b78, pid=26662, tid=140263356024576 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build > 1.7.0_67-b01) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [librocksdbjni-linux64.so+0x1aeb78] > rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # > /yarn/nm/usercache/robert/appcache/application_1484132267957_0007/container_1484132267957_0007_01_10/hs_err_pid26662.log > Compiled method (nm) 1869778 903 n org.rocksdb.RocksDB::remove > (native) > total in heap [0x7f91b40b9dd0,0x7f91b40ba150] = 896 > relocation [0x7f91b40b9ef0,0x7f91b40b9f48] = 88 > main code [0x7f91b40b9f60,0x7f91b40ba150] = 496 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.sun.com/bugreport/crash.jsp > # The crash happened outside the Java Virtual Machine in native code. > # See problematic frame for where to report the bug. > # > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3427) Add watermark monitoring to JobManager web frontend
[ https://issues.apache.org/jira/browse/FLINK-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822779#comment-15822779 ] Robert Metzger commented on FLINK-3427: --- Hi Ivan, Sadly, the person who wanted to work on this decided not to do it. So if you are still interested, feel free to assign it to you again. So answer your question: I was thinking to add a new tab for “Event time” It will show the same plan as the “Plan” view, but with the low watermarks of each operator (We’ll compute the low watermark for the task in the web interface based on watermarks reported by the subtasks). There should also be a view to see the subtasks individual watermarks and the max watermark from the subtasks > Add watermark monitoring to JobManager web frontend > --- > > Key: FLINK-3427 > URL: https://issues.apache.org/jira/browse/FLINK-3427 > Project: Flink > Issue Type: Improvement > Components: Streaming, Webfrontend >Reporter: Robert Metzger > > Currently, its quite hard to figure out issues with the watermarks. > I think we can improve the situation by reporting the following information > through the metrics system: > - Report the current low watermark for each operator (this way, you can see > if one operator is preventing the watermarks to rise) > - Report the number of events arrived after the low watermark (users can see > how accurate the watermarks are) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-5489) maven release:prepare fails due to invalid JDOM comments in pom.xml
[ https://issues.apache.org/jira/browse/FLINK-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-5489. --- Resolution: Fixed Fix Version/s: 1.3.0 Thank you for fixing this! Resolved in master with commit http://git-wip-us.apache.org/repos/asf/flink/commit/e2ba042c > maven release:prepare fails due to invalid JDOM comments in pom.xml > --- > > Key: FLINK-5489 > URL: https://issues.apache.org/jira/browse/FLINK-5489 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.2.0, 1.3.0 >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Labels: newbie > Fix For: 1.3.0 > > > When I was trying to publish Flink to our internal artifactory, I found out > that {{maven release:prepare}} has failed because the plugin complains about > the some of the comments pom.xml do not conform with the JDOM format: > {noformat} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-release-plugin:2.4.2:prepare (default-cli) on > project flink-parent: Execution default-cli of goal > org.apache.maven.plugins:maven-release-plugin:2.4.2:prepare failed: The data > "- > [ERROR] This module is used a dependency in the root pom. It activates > shading for all sub modules > [ERROR] through an include rule in the shading configuration. This assures > that Maven always generates > [ERROR] an effective pom for all modules, i.e. get rid of Maven properties. > In particular, this is needed > [ERROR] to define the Scala version property in the root pom but not let the > root pom depend on Scala > [ERROR] and thus be suffixed along with all other modules. > [ERROR] " is not legal for a JDOM comment: Comment data cannot start with a > hyphen. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5489) maven release:prepare fails due to invalid JDOM comments in pom.xml
[ https://issues.apache.org/jira/browse/FLINK-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5489: -- Component/s: Build System > maven release:prepare fails due to invalid JDOM comments in pom.xml > --- > > Key: FLINK-5489 > URL: https://issues.apache.org/jira/browse/FLINK-5489 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.2.0, 1.3.0 >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Labels: newbie > Fix For: 1.3.0 > > > When I was trying to publish Flink to our internal artifactory, I found out > that {{maven release:prepare}} has failed because the plugin complains about > the some of the comments pom.xml do not conform with the JDOM format: > {noformat} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-release-plugin:2.4.2:prepare (default-cli) on > project flink-parent: Execution default-cli of goal > org.apache.maven.plugins:maven-release-plugin:2.4.2:prepare failed: The data > "- > [ERROR] This module is used a dependency in the root pom. It activates > shading for all sub modules > [ERROR] through an include rule in the shading configuration. This assures > that Maven always generates > [ERROR] an effective pom for all modules, i.e. get rid of Maven properties. > In particular, this is needed > [ERROR] to define the Scala version property in the root pom but not let the > root pom depend on Scala > [ERROR] and thus be suffixed along with all other modules. > [ERROR] " is not legal for a JDOM comment: Comment data cannot start with a > hyphen. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4959) Write Documentation for ProcessFunction
[ https://issues.apache.org/jira/browse/FLINK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4959: -- Priority: Critical (was: Blocker) > Write Documentation for ProcessFunction > --- > > Key: FLINK-4959 > URL: https://issues.apache.org/jira/browse/FLINK-4959 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 1.2.0 >Reporter: Aljoscha Krettek >Priority: Critical > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4959) Write Documentation for ProcessFunction
[ https://issues.apache.org/jira/browse/FLINK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4959: -- Issue Type: Sub-task (was: Improvement) Parent: FLINK-5430 > Write Documentation for ProcessFunction > --- > > Key: FLINK-4959 > URL: https://issues.apache.org/jira/browse/FLINK-4959 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 1.2.0 >Reporter: Aljoscha Krettek >Priority: Blocker > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5473) setMaxParallelism() higher than 1 is possible on non-parallel operators
[ https://issues.apache.org/jira/browse/FLINK-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5473: -- Assignee: Stefan Richter > setMaxParallelism() higher than 1 is possible on non-parallel operators > --- > > Key: FLINK-5473 > URL: https://issues.apache.org/jira/browse/FLINK-5473 > Project: Flink > Issue Type: Bug > Components: DataStream API >Affects Versions: 1.2.0 >Reporter: Robert Metzger >Assignee: Stefan Richter > > While trying out Flink 1.2, I found out that you can set a maxParallelism > higher than 1 on a non-parallel operator. > I think we should have the same semantics as the setParallelism() method. > Also, when setting a global maxParallelism in the execution environment, it > will be set as a default value for the non-parallel operator. > When restoring a savepoint from 1.1, you have to set the maxParallelism to > the parallelism of the 1.1 job. Non-parallel operators will then also get the > maxPar set to this value, leading to an error on restore. > So currently, users restoring from 1.1 to 1.2 have to manually set the > maxParallelism to 1 for all non-parallel operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5463) RocksDB.disposeInternal does not react to interrupts, blocks task cancellation
[ https://issues.apache.org/jira/browse/FLINK-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823972#comment-15823972 ] Robert Metzger commented on FLINK-5463: --- After talking with [~srichter] about this, we decided to wait and see how frequently this error occurs. > RocksDB.disposeInternal does not react to interrupts, blocks task cancellation > -- > > Key: FLINK-5463 > URL: https://issues.apache.org/jira/browse/FLINK-5463 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Robert Metzger > > I'm using Flink 699f4b0. > My Flink job is slow while cancelling because RockDB seems to be busy with > disposing its state: > {code} > 2017-01-11 18:48:23,315 INFO org.apache.flink.runtime.taskmanager.Task > - Triggering cancellation of task code > TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071 > }, EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) > (1/1) (2accc6ca2727c4f7ec963318fbd237e9). > 2017-01-11 18:48:53,318 WARN org.apache.flink.runtime.taskmanager.Task > - Task 'TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), Windowed > Stream.apply(AllWindowedStream.java:440)) (1/1)' did not react to cancelling > signal, but is stuck in method: > org.rocksdb.RocksDB.disposeInternal(Native Method) > org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37) > org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:56) > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:250) > org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:331) > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:169) > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.dispose(WindowOperator.java:273) > org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:439) > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:340) > org.apache.flink.runtime.taskmanager.Task.run(Task.java:654) > java.lang.Thread.run(Thread.java:745) > 2017-01-11 18:48:53,319 WARN org.apache.flink.runtime.taskmanager.Task > - Task 'TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1)' > did not react to cancelling signal, but is stuck in method: > org.rocksdb.RocksDB.disposeInternal(Native Method) > org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37) > org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:56) > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:250) > org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:331) > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:169) > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.dispose(WindowOperator.java:273) > org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:439) > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:340) > org.apache.flink.runtime.taskmanager.Task.run(Task.java:654) > java.lang.Thread.run(Thread.java:745) > 2017-01-11 18:49:23,319 WARN org.apache.flink.runtime.taskmanager.Task > - Task 'TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1)' > did not react to cancelling signal, but is stuck in method: > org.rocksdb.RocksDB.disposeInternal(Native Method) > org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37) > org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:56) > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:250) > org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:331) > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:169) > org.apache.flink.streaming.runtime.operators.windowing.
[jira] [Commented] (FLINK-5345) IOManager failed to properly clean up temp file directory
[ https://issues.apache.org/jira/browse/FLINK-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823974#comment-15823974 ] Robert Metzger commented on FLINK-5345: --- [~tonycox] What's the progress on fixing this issue? > IOManager failed to properly clean up temp file directory > - > > Key: FLINK-5345 > URL: https://issues.apache.org/jira/browse/FLINK-5345 > Project: Flink > Issue Type: Bug >Affects Versions: 1.1.3 >Reporter: Robert Metzger >Assignee: Anton Solovev > Labels: simplex, starter > Fix For: 1.2.0, 1.3.0 > > > While testing 1.1.3 RC3, I have the following message in my log: > {code} > 2016-12-15 14:46:05,450 INFO > org.apache.flink.streaming.runtime.tasks.StreamTask - Timer service > is shutting down. > 2016-12-15 14:46:05,452 INFO org.apache.flink.runtime.taskmanager.Task > - Source: control events generator (29/40) > (73915a232ba09e642f9dff92f8c8773a) switched from CANCELING to CANCELED. > 2016-12-15 14:46:05,452 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Source: control events generator > (29/40) (73915a232ba09e642f9dff92f8c8773a). > 2016-12-15 14:46:05,454 INFO org.apache.flink.yarn.YarnTaskManager > - Un-registering task and sending final execution state > CANCELED to JobManager for task Source: control events genera > tor (73915a232ba09e642f9dff92f8c8773a) > 2016-12-15 14:46:40,609 INFO org.apache.flink.yarn.YarnTaskManagerRunner > - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested. > 2016-12-15 14:46:40,611 INFO org.apache.flink.runtime.blob.BlobCache > - Shutting down BlobCache > 2016-12-15 14:46:40,724 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@10.240.0.34:33635] has failed, address is now gated for > [5000] ms. > Reason is: [Disassociated]. > 2016-12-15 14:46:40,808 ERROR > org.apache.flink.runtime.io.disk.iomanager.IOManager - IOManager > failed to properly clean up temp file directory: > /yarn/nm/usercache/robert/appcache/application_148129128 > 9979_0024/flink-io-f0ff3f66-b9e2-4560-881f-2ab43bc448b5 > java.lang.IllegalArgumentException: > /yarn/nm/usercache/robert/appcache/application_1481291289979_0024/flink-io-f0ff3f66-b9e2-4560-881f-2ab43bc448b5/62e14e1891fe1e334c921dfd19a32a84/StreamMap_11_24/dummy_state > does not exist > at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637) > at > org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) > at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) > at > org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) > at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) > at > org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) > at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) > at > org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) > at > org.apache.flink.runtime.io.disk.iomanager.IOManager.shutdown(IOManager.java:109) > at > org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync.shutdown(IOManagerAsync.java:185) > at > org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync$1.run(IOManagerAsync.java:105) > {code} > This was the last message logged from that machine. I suspect two threads are > trying to clean up the directories during shutdown? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4941) Show ship strategy in web interface
[ https://issues.apache.org/jira/browse/FLINK-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-4941. --- Resolution: Fixed This won't be backported to 1.1. > Show ship strategy in web interface > --- > > Key: FLINK-4941 > URL: https://issues.apache.org/jira/browse/FLINK-4941 > Project: Flink > Issue Type: Bug > Components: Webfrontend >Reporter: Robert Metzger >Assignee: Robert Metzger > Fix For: 1.2.0 > > Attachments: ship_strategy_bad.png, ship_strategy_good.png > > > Currently, only Flink's Plan visualizer shows the ship strategy of a > streaming job, however not web interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5549) TypeExtractor fails with RuntimeException, but should use GGenericTypeInfoeneric
Robert Metzger created FLINK-5549: - Summary: TypeExtractor fails with RuntimeException, but should use GGenericTypeInfoeneric Key: FLINK-5549 URL: https://issues.apache.org/jira/browse/FLINK-5549 Project: Flink Issue Type: Bug Components: Type Serialization System Affects Versions: 1.2.0 Reporter: Robert Metzger This issue has been reported by a user on StackOverflow: http://stackoverflow.com/questions/41700568/runtimeexception-when-using-flinks-mapfunction-with-cassandra-insert-but-not {code} Exception in thread "main" java.lang.RuntimeException: The field private java.util.List com.datastax.driver.core.querybuilder.BuiltStatement.values is already contained in the hierarchy of the class com.datastax.driver.core.querybuilder.BuiltStatement.Please use unique field names through your classes hierarchy at org.apache.flink.api.java.typeutils.TypeExtractor.getAllDeclaredFields(TypeExtractor.java:1762) at org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1683) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1580) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1479) at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:737) at org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:565) at org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:366) at org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:305) at org.apache.flink.api.java.typeutils.TypeExtractor.getMapReturnTypes(TypeExtractor.java:120) at org.apache.flink.streaming.api.datastream.DataStream.map(DataStream.java:506) at se.hiq.bjornper.testenv.cassandra.SOCassandraQueryTest.main(SOCassandraQueryTest.java:51) {code} When Flink is trying to analyze the POJO, the {{getAllDeclaredFields}} method fails with a RuntimeException. When the user is using a different class that contains the POJO, it just uses the GenericTypeInfo which is able to serialize the type. I think we need to throw a InvalidTypesException in the {{getAllDeclaredFields}} method to fix the issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5549) TypeExtractor fails with RuntimeException, but should use GenericTypeInfoeneric
[ https://issues.apache.org/jira/browse/FLINK-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5549: -- Summary: TypeExtractor fails with RuntimeException, but should use GenericTypeInfoeneric (was: TypeExtractor fails with RuntimeException, but should use GGenericTypeInfoeneric) > TypeExtractor fails with RuntimeException, but should use > GenericTypeInfoeneric > --- > > Key: FLINK-5549 > URL: https://issues.apache.org/jira/browse/FLINK-5549 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 1.2.0 >Reporter: Robert Metzger > > This issue has been reported by a user on StackOverflow: > http://stackoverflow.com/questions/41700568/runtimeexception-when-using-flinks-mapfunction-with-cassandra-insert-but-not > {code} > Exception in thread "main" java.lang.RuntimeException: The field private > java.util.List com.datastax.driver.core.querybuilder.BuiltStatement.values is > already contained in the hierarchy of the class > com.datastax.driver.core.querybuilder.BuiltStatement.Please use unique field > names through your classes hierarchy > at > org.apache.flink.api.java.typeutils.TypeExtractor.getAllDeclaredFields(TypeExtractor.java:1762) > at > org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1683) > at > org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1580) > at > org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1479) > at > org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:737) > at > org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:565) > at > org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:366) > at > org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:305) > at > org.apache.flink.api.java.typeutils.TypeExtractor.getMapReturnTypes(TypeExtractor.java:120) > at > org.apache.flink.streaming.api.datastream.DataStream.map(DataStream.java:506) > at > se.hiq.bjornper.testenv.cassandra.SOCassandraQueryTest.main(SOCassandraQueryTest.java:51) > {code} > When Flink is trying to analyze the POJO, the {{getAllDeclaredFields}} method > fails with a RuntimeException. When the user is using a different class that > contains the POJO, it just uses the GenericTypeInfo which is able to > serialize the type. > I think we need to throw a InvalidTypesException in the > {{getAllDeclaredFields}} method to fix the issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5549) TypeExtractor fails with RuntimeException, but should use GenericTypeInfo
[ https://issues.apache.org/jira/browse/FLINK-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5549: -- Summary: TypeExtractor fails with RuntimeException, but should use GenericTypeInfo (was: TypeExtractor fails with RuntimeException, but should use GenericTypeInfoeneric) > TypeExtractor fails with RuntimeException, but should use GenericTypeInfo > - > > Key: FLINK-5549 > URL: https://issues.apache.org/jira/browse/FLINK-5549 > Project: Flink > Issue Type: Bug > Components: Type Serialization System >Affects Versions: 1.2.0 >Reporter: Robert Metzger > > This issue has been reported by a user on StackOverflow: > http://stackoverflow.com/questions/41700568/runtimeexception-when-using-flinks-mapfunction-with-cassandra-insert-but-not > {code} > Exception in thread "main" java.lang.RuntimeException: The field private > java.util.List com.datastax.driver.core.querybuilder.BuiltStatement.values is > already contained in the hierarchy of the class > com.datastax.driver.core.querybuilder.BuiltStatement.Please use unique field > names through your classes hierarchy > at > org.apache.flink.api.java.typeutils.TypeExtractor.getAllDeclaredFields(TypeExtractor.java:1762) > at > org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1683) > at > org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1580) > at > org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1479) > at > org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:737) > at > org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:565) > at > org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:366) > at > org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:305) > at > org.apache.flink.api.java.typeutils.TypeExtractor.getMapReturnTypes(TypeExtractor.java:120) > at > org.apache.flink.streaming.api.datastream.DataStream.map(DataStream.java:506) > at > se.hiq.bjornper.testenv.cassandra.SOCassandraQueryTest.main(SOCassandraQueryTest.java:51) > {code} > When Flink is trying to analyze the POJO, the {{getAllDeclaredFields}} method > fails with a RuntimeException. When the user is using a different class that > contains the POJO, it just uses the GenericTypeInfo which is able to > serialize the type. > I think we need to throw a InvalidTypesException in the > {{getAllDeclaredFields}} method to fix the issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3150) Make YARN container invocation configurable
[ https://issues.apache.org/jira/browse/FLINK-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-3150. --- Resolution: Fixed Fix Version/s: 1.3.0 Thanks a lot for implementing this. Merged in master: http://git-wip-us.apache.org/repos/asf/flink/commit/8f4139a4 > Make YARN container invocation configurable > --- > > Key: FLINK-3150 > URL: https://issues.apache.org/jira/browse/FLINK-3150 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Robert Metzger >Assignee: Nico Kruber > Labels: qa > Fix For: 1.3.0 > > > Currently, the JVM invocation call of YARN containers is hardcoded. > With this change, I would like to make the call configurable, using a string > such as > "%java% %memopts% %jvmopts% ..." > Also, we should respect the {{java.env.home}} if its set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5553) Job fails during deployment with IllegalStateException from subpartition request
[ https://issues.apache.org/jira/browse/FLINK-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5553: -- Attachment: application-1484132267957-0076 > Job fails during deployment with IllegalStateException from subpartition > request > > > Key: FLINK-5553 > URL: https://issues.apache.org/jira/browse/FLINK-5553 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.3.0 >Reporter: Robert Metzger > Attachments: application-1484132267957-0076 > > > While running a test job with Flink 1.3-SNAPSHOT > (6fb6967b9f9a31f034bd09fcf76aaf147bc8e9a0) the job failed with this exception: > {code} > 2017-01-18 14:56:27,043 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed > (9/10) (befc06d0e792c2ce39dde74b365dd3cf) switched from DEPLOYING to RUNNING. > 2017-01-18 14:56:27,059 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Flat Map > (9/10) (e94a01ec283e5dce7f79b02cf51654c4) switched from DEPLOYING to RUNNING. > 2017-01-18 14:56:27,817 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Flat Map > (10/10) (cbb61c9a2f72c282877eb383e111f7cd) switched from RUNNING to FAILED. > java.lang.IllegalStateException: There has been an error in the channel. > at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104) > at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115) > at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419) > at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666) > at java.lang.Thread.run(Thread.java:745) > 2017-01-18 14:56:27,819 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job > Misbehaved Job (b1d985d11984df57400fdff2bb656c59) switched from state RUNNING > to FAILING. > java.lang.IllegalStateException: There has been an error in the channel. > at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104) > at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115) > at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419) > at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666) > at java.lang.Thread.run(Thread.java:745) > {code} > This is the first exception that is reported to the jobmanager. > I think this is related to missing network buffers. You see that from the > next deployment after the restart, where the deployment fails with the > insufficient number of buffers exception. > I'll add logs to the JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5553) Job fails during deployment with IllegalStateException from subpartition request
Robert Metzger created FLINK-5553: - Summary: Job fails during deployment with IllegalStateException from subpartition request Key: FLINK-5553 URL: https://issues.apache.org/jira/browse/FLINK-5553 Project: Flink Issue Type: Bug Components: Network Affects Versions: 1.3.0 Reporter: Robert Metzger While running a test job with Flink 1.3-SNAPSHOT (6fb6967b9f9a31f034bd09fcf76aaf147bc8e9a0) the job failed with this exception: {code} 2017-01-18 14:56:27,043 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (9/10) (befc06d0e792c2ce39dde74b365dd3cf) switched from DEPLOYING to RUNNING. 2017-01-18 14:56:27,059 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Flat Map (9/10) (e94a01ec283e5dce7f79b02cf51654c4) switched from DEPLOYING to RUNNING. 2017-01-18 14:56:27,817 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Flat Map (10/10) (cbb61c9a2f72c282877eb383e111f7cd) switched from RUNNING to FAILED. java.lang.IllegalStateException: There has been an error in the channel. at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77) at org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104) at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441) at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666) at java.lang.Thread.run(Thread.java:745) 2017-01-18 14:56:27,819 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Misbehaved Job (b1d985d11984df57400fdff2bb656c59) switched from state RUNNING to FAILING. java.lang.IllegalStateException: There has been an error in the channel. at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77) at org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104) at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441) at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666) at java.lang.Thread.run(Thread.java:745) {code} This is the first exception that is reported to the jobmanager. I think this is related to missing network buffers. You see that from the next deployment after the restart, where the deployment fails with the insufficient number of buffers exception. I'll add logs to the JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5462) Flink job fails due to java.util.concurrent.CancellationException while snapshotting
[ https://issues.apache.org/jira/browse/FLINK-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828276#comment-15828276 ] Robert Metzger commented on FLINK-5462: --- I've looked through this and other logs with the same error, and I think the problem is that this error "only" occurs in the presence of other failures not as a root cause for an issue. I don't think that we need to urgently fix this issue for the 1.2 release. > Flink job fails due to java.util.concurrent.CancellationException while > snapshotting > > > Key: FLINK-5462 > URL: https://issues.apache.org/jira/browse/FLINK-5462 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.2.0 >Reporter: Robert Metzger > Attachments: application-1484132267957-0005 > > > I'm using Flink 699f4b0. > My restored, rescaled Flink job failed while creating a checkpoint with the > following exception: > {code} > 2017-01-11 18:46:49,853 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering > checkpoint 3 @ 1484160409846 > 2017-01-11 18:49:50,111 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- > TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream > .apply(AllWindowedStream.java:440)) (1/1) (2accc6ca2727c4f7ec963318fbd237e9) > switched from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 3 > for operator TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream.ap > ply(AllWindowedStream.java:440)) (1/1).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 3 for > operator TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream.apply(AllWind > owedStream.java:440)) (1/1). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:899) > ... 5 more > 2017-01-11 18:49:50,113 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Generate > Event Window stream (90859d392c1da472e07695f434b332ef) switched from state > RUNNING to FAILING. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 3 > for operator TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream.ap > ply(AllWindowedStream.java:440)) (1/1).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 3 for > operator TriggerWindow(TumblingEventTimeWindows(4), > ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071}, > EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1). > ... 6 more > Caused by: java.util.concurrent.CancellationException > at java.util.concurrent.FutureTask.report(FutureTask.java:121) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUt
[jira] [Created] (FLINK-5555) Add documentation about debugging watermarks
Robert Metzger created FLINK-: - Summary: Add documentation about debugging watermarks Key: FLINK- URL: https://issues.apache.org/jira/browse/FLINK- Project: Flink Issue Type: Sub-task Components: Documentation Affects Versions: 1.2.0 Reporter: Robert Metzger Assignee: Robert Metzger Fix For: 1.2.0 This was a frequent question on the mailing list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5005) Publish Scala 2.12 artifacts
[ https://issues.apache.org/jira/browse/FLINK-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-5005: -- Component/s: Scala API > Publish Scala 2.12 artifacts > > > Key: FLINK-5005 > URL: https://issues.apache.org/jira/browse/FLINK-5005 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Andrew Roberts > > Scala 2.12 was [released|http://www.scala-lang.org/news/2.12.0] today, and > offers many compile-time and runtime speed improvements. It would be great to > get artifacts up on maven central to allow Flink users to migrate to Scala > 2.12.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5005) Publish Scala 2.12 artifacts
[ https://issues.apache.org/jira/browse/FLINK-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829993#comment-15829993 ] Robert Metzger commented on FLINK-5005: --- For the {{flakka-*}} artifacts, I can probably build and deploy 2.12 versions. Flakka is a akka fork with a feature backport from newer akka versions. > Publish Scala 2.12 artifacts > > > Key: FLINK-5005 > URL: https://issues.apache.org/jira/browse/FLINK-5005 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Andrew Roberts > > Scala 2.12 was [released|http://www.scala-lang.org/news/2.12.0] today, and > offers many compile-time and runtime speed improvements. It would be great to > get artifacts up on maven central to allow Flink users to migrate to Scala > 2.12.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-5005) Publish Scala 2.12 artifacts
[ https://issues.apache.org/jira/browse/FLINK-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829993#comment-15829993 ] Robert Metzger edited comment on FLINK-5005 at 1/19/17 2:11 PM: For the {{flakka-*}} artifacts, I can probably build and deploy 2.12 versions. Flakka is a akka fork with a feature backport from newer akka versions. Flakka is build from here: https://github.com/dataArtisans/flakka was (Author: rmetzger): For the {{flakka-*}} artifacts, I can probably build and deploy 2.12 versions. Flakka is a akka fork with a feature backport from newer akka versions. > Publish Scala 2.12 artifacts > > > Key: FLINK-5005 > URL: https://issues.apache.org/jira/browse/FLINK-5005 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Andrew Roberts > > Scala 2.12 was [released|http://www.scala-lang.org/news/2.12.0] today, and > offers many compile-time and runtime speed improvements. It would be great to > get artifacts up on maven central to allow Flink users to migrate to Scala > 2.12.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)