[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1
[ https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994197#comment-14994197 ] ASF GitHub Bot commented on FLINK-2979: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1336#issuecomment-154499240 Manually merged > RollingSink does not work with Hadoop 2.7.1 > --- > > Key: FLINK-2979 > URL: https://issues.apache.org/jira/browse/FLINK-2979 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 0.10 >Reporter: Till Rohrmann >Assignee: Aljoscha Krettek > Fix For: 1.0 > > > When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, > then the test either does not finish because it's stuck in an endless restart > loop with the following exception > {code} > java.lang.Exception: Could not restore checkpointed state to operators and > functions > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Failed to restore state to function: > In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was > neither moved to pending nor is still in progress. > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406) > ... 3 more > Caused by: java.lang.RuntimeException: In-Progress file > hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to > pending nor is still in progress. > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670) > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162) > ... 4 more > {code} > or it fails because the number of read strings differs from the exactly-once > result (some strings are read multiple times). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1
[ https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994198#comment-14994198 ] ASF GitHub Bot commented on FLINK-2979: --- Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/1336 > RollingSink does not work with Hadoop 2.7.1 > --- > > Key: FLINK-2979 > URL: https://issues.apache.org/jira/browse/FLINK-2979 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 0.10 >Reporter: Till Rohrmann >Assignee: Aljoscha Krettek > Fix For: 1.0 > > > When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, > then the test either does not finish because it's stuck in an endless restart > loop with the following exception > {code} > java.lang.Exception: Could not restore checkpointed state to operators and > functions > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Failed to restore state to function: > In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was > neither moved to pending nor is still in progress. > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406) > ... 3 more > Caused by: java.lang.RuntimeException: In-Progress file > hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to > pending nor is still in progress. > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670) > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162) > ... 4 more > {code} > or it fails because the number of read strings differs from the exactly-once > result (some strings are read multiple times). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1
[ https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994205#comment-14994205 ] ASF GitHub Bot commented on FLINK-2979: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1336#issuecomment-154499798 Can you also merge this into the 0.10 branch, so the 0.10.1 release will get the fix? > RollingSink does not work with Hadoop 2.7.1 > --- > > Key: FLINK-2979 > URL: https://issues.apache.org/jira/browse/FLINK-2979 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 0.10 >Reporter: Till Rohrmann >Assignee: Aljoscha Krettek > Fix For: 1.0 > > > When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, > then the test either does not finish because it's stuck in an endless restart > loop with the following exception > {code} > java.lang.Exception: Could not restore checkpointed state to operators and > functions > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Failed to restore state to function: > In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was > neither moved to pending nor is still in progress. > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406) > ... 3 more > Caused by: java.lang.RuntimeException: In-Progress file > hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to > pending nor is still in progress. > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670) > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162) > ... 4 more > {code} > or it fails because the number of read strings differs from the exactly-once > result (some strings are read multiple times). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1
[ https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994207#comment-14994207 ] ASF GitHub Bot commented on FLINK-2979: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1336#issuecomment-154499953 Already done. :smiley: > RollingSink does not work with Hadoop 2.7.1 > --- > > Key: FLINK-2979 > URL: https://issues.apache.org/jira/browse/FLINK-2979 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 0.10 >Reporter: Till Rohrmann >Assignee: Aljoscha Krettek > Fix For: 1.0 > > > When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, > then the test either does not finish because it's stuck in an endless restart > loop with the following exception > {code} > java.lang.Exception: Could not restore checkpointed state to operators and > functions > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Failed to restore state to function: > In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was > neither moved to pending nor is still in progress. > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406) > ... 3 more > Caused by: java.lang.RuntimeException: In-Progress file > hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to > pending nor is still in progress. > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670) > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162) > ... 4 more > {code} > or it fails because the number of read strings differs from the exactly-once > result (some strings are read multiple times). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1
[ https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993808#comment-14993808 ] ASF GitHub Bot commented on FLINK-2979: --- GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/1336 [FLINK-2979] Fix RollingSink truncate for Hadoop 2.7 The problem was, that truncate is asynchronous and the RollingSink was not taking this into account. Now it has a loop after the truncate call that waits until the file is actually truncated. This also changes the Hadoop 2.6 travis build to 2.7, instead. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink rolling-sink-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1336.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1336 commit 05136da77617a577e62cca2dec469e2c2d14b91e Author: Aljoscha KrettekDate: 2015-11-06T15:15:06Z [FLINK-2979] Fix RollingSink truncate for Hadoop 2.7 The problem was, that truncate is asynchronous and the RollingSink was not taking this into account. Now it has a loop after the truncate call that waits until the file is actually truncated. This also changes the Hadoop 2.6 travis build to 2.7, instead. > RollingSink does not work with Hadoop 2.7.1 > --- > > Key: FLINK-2979 > URL: https://issues.apache.org/jira/browse/FLINK-2979 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 0.10 >Reporter: Till Rohrmann >Assignee: Aljoscha Krettek > > When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, > then the test either does not finish because it's stuck in an endless restart > loop with the following exception > {code} > java.lang.Exception: Could not restore checkpointed state to operators and > functions > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Failed to restore state to function: > In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was > neither moved to pending nor is still in progress. > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406) > ... 3 more > Caused by: java.lang.RuntimeException: In-Progress file > hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to > pending nor is still in progress. > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670) > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162) > ... 4 more > {code} > or it fails because the number of read strings differs from the exactly-once > result (some strings are read multiple times). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1
[ https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991725#comment-14991725 ] Till Rohrmann commented on FLINK-2979: -- The failure might be caused by {code} java.lang.Exception: Could not restore checkpointed state to operators and functions at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Failed to restore state to function: Could not invoke truncate. at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406) ... 3 more Caused by: java.lang.RuntimeException: Could not invoke truncate. at org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:695) at org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162) ... 4 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:678) ... 6 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to TRUNCATE_FILE /string-non-rolling-out/part-2-2 for DFSClient_NONMAPREDUCE_-401178409_229 on 127.0.0.1 because DFSClient_NONMAPREDUCE_-401178409_229 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2885) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncateInternal(FSNamesystem.java:2082) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncateInt(FSNamesystem.java:2028) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncate(FSNamesystem.java:1998) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.truncate(NameNodeRpcServer.java:926) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.truncate(ClientNamenodeProtocolServerSideTranslatorPB.java:599) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) at org.apache.hadoop.ipc.Client.call(Client.java:1476) at org.apache.hadoop.ipc.Client.call(Client.java:1407) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy23.truncate(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.truncate(ClientNamenodeProtocolTranslatorPB.java:313) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy24.truncate(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.truncate(DFSClient.java:2024) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:689) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:685) at