[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994197#comment-14994197
 ] 

ASF GitHub Bot commented on FLINK-2979:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1336#issuecomment-154499240
  
Manually merged 


> RollingSink does not work with Hadoop 2.7.1
> ---
>
> Key: FLINK-2979
> URL: https://issues.apache.org/jira/browse/FLINK-2979
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
> Fix For: 1.0
>
>
> When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, 
> then the test either does not finish because it's stuck in an endless restart 
> loop with the following exception
> {code}
> java.lang.Exception: Could not restore checkpointed state to operators and 
> functions
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Failed to restore state to function: 
> In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was 
> neither moved to pending nor is still in progress.
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406)
>   ... 3 more
> Caused by: java.lang.RuntimeException: In-Progress file 
> hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to 
> pending nor is still in progress.
>   at 
> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670)
>   at 
> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162)
>   ... 4 more
> {code}
> or it fails because the number of read strings differs from the exactly-once 
> result (some strings are read multiple times).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994198#comment-14994198
 ] 

ASF GitHub Bot commented on FLINK-2979:
---

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/1336


> RollingSink does not work with Hadoop 2.7.1
> ---
>
> Key: FLINK-2979
> URL: https://issues.apache.org/jira/browse/FLINK-2979
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
> Fix For: 1.0
>
>
> When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, 
> then the test either does not finish because it's stuck in an endless restart 
> loop with the following exception
> {code}
> java.lang.Exception: Could not restore checkpointed state to operators and 
> functions
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Failed to restore state to function: 
> In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was 
> neither moved to pending nor is still in progress.
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406)
>   ... 3 more
> Caused by: java.lang.RuntimeException: In-Progress file 
> hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to 
> pending nor is still in progress.
>   at 
> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670)
>   at 
> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162)
>   ... 4 more
> {code}
> or it fails because the number of read strings differs from the exactly-once 
> result (some strings are read multiple times).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994205#comment-14994205
 ] 

ASF GitHub Bot commented on FLINK-2979:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1336#issuecomment-154499798
  
Can you also merge this into the 0.10 branch, so the 0.10.1 release will 
get the fix?


> RollingSink does not work with Hadoop 2.7.1
> ---
>
> Key: FLINK-2979
> URL: https://issues.apache.org/jira/browse/FLINK-2979
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
> Fix For: 1.0
>
>
> When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, 
> then the test either does not finish because it's stuck in an endless restart 
> loop with the following exception
> {code}
> java.lang.Exception: Could not restore checkpointed state to operators and 
> functions
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Failed to restore state to function: 
> In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was 
> neither moved to pending nor is still in progress.
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406)
>   ... 3 more
> Caused by: java.lang.RuntimeException: In-Progress file 
> hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to 
> pending nor is still in progress.
>   at 
> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670)
>   at 
> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162)
>   ... 4 more
> {code}
> or it fails because the number of read strings differs from the exactly-once 
> result (some strings are read multiple times).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994207#comment-14994207
 ] 

ASF GitHub Bot commented on FLINK-2979:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1336#issuecomment-154499953
  
Already done. :smiley: 


> RollingSink does not work with Hadoop 2.7.1
> ---
>
> Key: FLINK-2979
> URL: https://issues.apache.org/jira/browse/FLINK-2979
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
> Fix For: 1.0
>
>
> When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, 
> then the test either does not finish because it's stuck in an endless restart 
> loop with the following exception
> {code}
> java.lang.Exception: Could not restore checkpointed state to operators and 
> functions
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Failed to restore state to function: 
> In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was 
> neither moved to pending nor is still in progress.
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406)
>   ... 3 more
> Caused by: java.lang.RuntimeException: In-Progress file 
> hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to 
> pending nor is still in progress.
>   at 
> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670)
>   at 
> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162)
>   ... 4 more
> {code}
> or it fails because the number of read strings differs from the exactly-once 
> result (some strings are read multiple times).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1

2015-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993808#comment-14993808
 ] 

ASF GitHub Bot commented on FLINK-2979:
---

GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/1336

[FLINK-2979] Fix RollingSink truncate for Hadoop 2.7

The problem was, that truncate is asynchronous and the RollingSink was
not taking this into account.

Now it has a loop after the truncate call that waits until the file is
actually truncated.

This also changes the Hadoop 2.6 travis build to 2.7, instead.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink rolling-sink-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1336.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1336


commit 05136da77617a577e62cca2dec469e2c2d14b91e
Author: Aljoscha Krettek 
Date:   2015-11-06T15:15:06Z

[FLINK-2979] Fix RollingSink truncate for Hadoop 2.7

The problem was, that truncate is asynchronous and the RollingSink was
not taking this into account.

Now it has a loop after the truncate call that waits until the file is
actually truncated.

This also changes the Hadoop 2.6 travis build to 2.7, instead.




> RollingSink does not work with Hadoop 2.7.1
> ---
>
> Key: FLINK-2979
> URL: https://issues.apache.org/jira/browse/FLINK-2979
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>
> When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, 
> then the test either does not finish because it's stuck in an endless restart 
> loop with the following exception
> {code}
> java.lang.Exception: Could not restore checkpointed state to operators and 
> functions
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Failed to restore state to function: 
> In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was 
> neither moved to pending nor is still in progress.
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406)
>   ... 3 more
> Caused by: java.lang.RuntimeException: In-Progress file 
> hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to 
> pending nor is still in progress.
>   at 
> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670)
>   at 
> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162)
>   ... 4 more
> {code}
> or it fails because the number of read strings differs from the exactly-once 
> result (some strings are read multiple times).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2979) RollingSink does not work with Hadoop 2.7.1

2015-11-05 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991725#comment-14991725
 ] 

Till Rohrmann commented on FLINK-2979:
--

The failure might be caused by 

{code}
java.lang.Exception: Could not restore checkpointed state to operators and 
functions
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: Failed to restore state to function: Could not 
invoke truncate.
at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406)
... 3 more
Caused by: java.lang.RuntimeException: Could not invoke truncate.
at 
org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:695)
at 
org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120)
at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162)
... 4 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:678)
... 6 more
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to TRUNCATE_FILE /string-non-rolling-out/part-2-2 for 
DFSClient_NONMAPREDUCE_-401178409_229 on 127.0.0.1 because 
DFSClient_NONMAPREDUCE_-401178409_229 is already the current lease holder.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2885)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncateInternal(FSNamesystem.java:2082)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncateInt(FSNamesystem.java:2028)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncate(FSNamesystem.java:1998)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.truncate(NameNodeRpcServer.java:926)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.truncate(ClientNamenodeProtocolServerSideTranslatorPB.java:599)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy23.truncate(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.truncate(ClientNamenodeProtocolTranslatorPB.java:313)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy24.truncate(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.truncate(DFSClient.java:2024)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:689)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:685)
at