Re: Could not build up connection to JobManager
It is really strange. It's right that the CliFrontend now resolves localhost to the correct local address 10.218.100.122. Moreover, according to the logs, the JobManager is also started and binds to akka.tcp:// flink@10.218.100.122:6123. According to the logs, this is also the address the CliFrontend uses to connect to the JobManager. If the timestamps are correct, then the JobManager was still alive when the job was sent. I don't really understand why this happens. Can it be that the CliFrontend which binds to 127.0.0.1 cannot communicate with 10.218.100.122? Can it be that you have some settings which prevent this? For the failing 127.0.0.1 case, it would be helpful to have access to the JobManager log. I've updated the branch https://github.com/tillrohrmann/flink/tree/fixJobClient with a new fix for the localhost scenario. Could you try it out again? Thanks a lot for your help. Best regards, Till On Mon, Mar 16, 2015 at 10:30 AM, Ufuk Celebi u...@apache.org wrote: There was an issue for this: https://issues.apache.org/jira/browse/FLINK-1634 Can we close it then? On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hay Stephan, Great to know you could fix the issue. Thank you on the update. Best regards. On Mar 14, 2015, at 9:19 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! Forget what I said in the previous email. The issue with the wrong address binding seems to be solved now. There is another issue that the embedded taskmanager does not start properly, for whatever reason. My gut feeling is that there is something wrong There is a patch pending that changes the startup behavior to debug these situations much easier. I'll ping you as soon as that is in... Stephan On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! One thing you can try is to add to the JVM startup options (in the scripts in the bin folder) the option -Djava.net.preferIPv4Stack=true and see if that helps it? Stephan On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Still this is no luck. I’ll upload the logs with configuration “localhost as well as “127.0.0.1” so you can take a look. 127.0.0.1 flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log localhost flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org wrote: Hi Dulaj, sorry for my late response. It looks as if the JobClient tries to connect to the JobManager using its IPv6 instead of IPv4. Akka is really picky when it comes to remote address. If Akka binds to the FQDN, then other ActorSystem which try to connect to it using its IP address won't be successful. I assume that this might be a problem. I tried to fix it. You can find it here [1]. Could you please try it out by starting a local cluster with the start-local.sh script. If it fails, could you please send me all log files (client, jobmanager and taskmanager). Once we figured out why the JobCilent does not connect, we can try to tackle the BlobServer issue. Cheers, Till [1] https://github.com/tillrohrmann/flink/tree/fixJobClient On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
Re: Could not build up connection to JobManager
Could you please upload the logs? They would be really helpful. On Mon, Mar 16, 2015 at 6:11 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, I tested the update but it’s still the same. I think it isn’t a problem with my system because, I have an XAMPP server working totally fine (I tried with it is shut down as well) and also I doubly checked hosts files. I had little snitch installed but I also tried uninstalling it. Isn’t there a way around without using DNS to resolve localhost? On Mar 16, 2015, at 10:04 PM, Till Rohrmann trohrm...@apache.org wrote: It is really strange. It's right that the CliFrontend now resolves localhost to the correct local address 10.218.100.122. Moreover, according to the logs, the JobManager is also started and binds to akka.tcp:// flink@10.218.100.122:6123. According to the logs, this is also the address the CliFrontend uses to connect to the JobManager. If the timestamps are correct, then the JobManager was still alive when the job was sent. I don't really understand why this happens. Can it be that the CliFrontend which binds to 127.0.0.1 cannot communicate with 10.218.100.122? Can it be that you have some settings which prevent this? For the failing 127.0.0.1 case, it would be helpful to have access to the JobManager log. I've updated the branch https://github.com/tillrohrmann/flink/tree/fixJobClient with a new fix for the localhost scenario. Could you try it out again? Thanks a lot for your help. Best regards, Till On Mon, Mar 16, 2015 at 10:30 AM, Ufuk Celebi u...@apache.org wrote: There was an issue for this: https://issues.apache.org/jira/browse/FLINK-1634 Can we close it then? On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hay Stephan, Great to know you could fix the issue. Thank you on the update. Best regards. On Mar 14, 2015, at 9:19 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! Forget what I said in the previous email. The issue with the wrong address binding seems to be solved now. There is another issue that the embedded taskmanager does not start properly, for whatever reason. My gut feeling is that there is something wrong There is a patch pending that changes the startup behavior to debug these situations much easier. I'll ping you as soon as that is in... Stephan On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! One thing you can try is to add to the JVM startup options (in the scripts in the bin folder) the option -Djava.net.preferIPv4Stack=true and see if that helps it? Stephan On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Still this is no luck. I’ll upload the logs with configuration “localhost as well as “127.0.0.1” so you can take a look. 127.0.0.1 flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log localhost flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org wrote: Hi Dulaj, sorry for my late response. It looks as if the JobClient tries to connect to the JobManager using its IPv6 instead of IPv4. Akka is really picky when it comes to remote address. If Akka binds to the FQDN, then other ActorSystem which try to connect to it using its IP address won't be successful. I assume that this might be a problem. I tried to fix it. You can find it here [1]. Could you please try it out by starting a local cluster with the start-local.sh script. If it fails, could you please send me all log files (client, jobmanager and taskmanager). Once we figured out why the JobCilent does not connect, we can try to tackle the BlobServer issue. Cheers, Till [1] https://github.com/tillrohrmann/flink/tree/fixJobClient On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at
Re: Could not build up connection to JobManager
There was an issue for this: https://issues.apache.org/jira/browse/FLINK-1634 Can we close it then? On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hay Stephan, Great to know you could fix the issue. Thank you on the update. Best regards. On Mar 14, 2015, at 9:19 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! Forget what I said in the previous email. The issue with the wrong address binding seems to be solved now. There is another issue that the embedded taskmanager does not start properly, for whatever reason. My gut feeling is that there is something wrong There is a patch pending that changes the startup behavior to debug these situations much easier. I'll ping you as soon as that is in... Stephan On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! One thing you can try is to add to the JVM startup options (in the scripts in the bin folder) the option -Djava.net.preferIPv4Stack=true and see if that helps it? Stephan On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Still this is no luck. I’ll upload the logs with configuration “localhost as well as “127.0.0.1” so you can take a look. 127.0.0.1 flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log localhost flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org wrote: Hi Dulaj, sorry for my late response. It looks as if the JobClient tries to connect to the JobManager using its IPv6 instead of IPv4. Akka is really picky when it comes to remote address. If Akka binds to the FQDN, then other ActorSystem which try to connect to it using its IP address won't be successful. I assume that this might be a problem. I tried to fix it. You can find it here [1]. Could you please try it out by starting a local cluster with the start-local.sh script. If it fails, could you please send me all log files (client, jobmanager and taskmanager). Once we figured out why the JobCilent does not connect, we can try to tackle the BlobServer issue. Cheers, Till [1] https://github.com/tillrohrmann/flink/tree/fixJobClient On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80 :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable. at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957) at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) at
Re: Could not build up connection to JobManager
Hey Dulaj! One thing you can try is to add to the JVM startup options (in the scripts in the bin folder) the option -Djava.net.preferIPv4Stack=true and see if that helps it? Stephan On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Still this is no luck. I’ll upload the logs with configuration “localhost as well as “127.0.0.1” so you can take a look. 127.0.0.1 flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log localhost flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org wrote: Hi Dulaj, sorry for my late response. It looks as if the JobClient tries to connect to the JobManager using its IPv6 instead of IPv4. Akka is really picky when it comes to remote address. If Akka binds to the FQDN, then other ActorSystem which try to connect to it using its IP address won't be successful. I assume that this might be a problem. I tried to fix it. You can find it here [1]. Could you please try it out by starting a local cluster with the start-local.sh script. If it fails, could you please send me all log files (client, jobmanager and taskmanager). Once we figured out why the JobCilent does not connect, we can try to tackle the BlobServer issue. Cheers, Till [1] https://github.com/tillrohrmann/flink/tree/fixJobClient On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80 :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable. at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957) at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) at org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125) at org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:322) ... 15 more Caused by: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka://flink/deadLetters), Path(/)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
Re: Could not build up connection to JobManager
Hey Dulaj! Forget what I said in the previous email. The issue with the wrong address binding seems to be solved now. There is another issue that the embedded taskmanager does not start properly, for whatever reason. My gut feeling is that there is something wrong There is a patch pending that changes the startup behavior to debug these situations much easier. I'll ping you as soon as that is in... Stephan On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! One thing you can try is to add to the JVM startup options (in the scripts in the bin folder) the option -Djava.net.preferIPv4Stack=true and see if that helps it? Stephan On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Still this is no luck. I’ll upload the logs with configuration “localhost as well as “127.0.0.1” so you can take a look. 127.0.0.1 flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log localhost flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org wrote: Hi Dulaj, sorry for my late response. It looks as if the JobClient tries to connect to the JobManager using its IPv6 instead of IPv4. Akka is really picky when it comes to remote address. If Akka binds to the FQDN, then other ActorSystem which try to connect to it using its IP address won't be successful. I assume that this might be a problem. I tried to fix it. You can find it here [1]. Could you please try it out by starting a local cluster with the start-local.sh script. If it fails, could you please send me all log files (client, jobmanager and taskmanager). Once we figured out why the JobCilent does not connect, we can try to tackle the BlobServer issue. Cheers, Till [1] https://github.com/tillrohrmann/flink/tree/fixJobClient On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80 :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable. at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957) at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) at org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125) at org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:322) ... 15 more Caused by: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka://flink/deadLetters), Path(/)] at
Re: Could not build up connection to JobManager
Hi, Still this is no luck. I’ll upload the logs with configuration “localhost as well as “127.0.0.1” so you can take a look. 127.0.0.1 flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log localhost flink-Vidura-flink-client-localhost.log https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org wrote: Hi Dulaj, sorry for my late response. It looks as if the JobClient tries to connect to the JobManager using its IPv6 instead of IPv4. Akka is really picky when it comes to remote address. If Akka binds to the FQDN, then other ActorSystem which try to connect to it using its IP address won't be successful. I assume that this might be a problem. I tried to fix it. You can find it here [1]. Could you please try it out by starting a local cluster with the start-local.sh script. If it fails, could you please send me all log files (client, jobmanager and taskmanager). Once we figured out why the JobCilent does not connect, we can try to tackle the BlobServer issue. Cheers, Till [1] https://github.com/tillrohrmann/flink/tree/fixJobClient On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80:0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable. at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957) at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) at org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125) at org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:322) ... 15 more Caused by: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka://flink/deadLetters), Path(/)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at
Re: Could not build up connection to JobManager
Hi Dulaj, sorry for my late response. It looks as if the JobClient tries to connect to the JobManager using its IPv6 instead of IPv4. Akka is really picky when it comes to remote address. If Akka binds to the FQDN, then other ActorSystem which try to connect to it using its IP address won't be successful. I assume that this might be a problem. I tried to fix it. You can find it here [1]. Could you please try it out by starting a local cluster with the start-local.sh script. If it fails, could you please send me all log files (client, jobmanager and taskmanager). Once we figured out why the JobCilent does not connect, we can try to tackle the BlobServer issue. Cheers, Till [1] https://github.com/tillrohrmann/flink/tree/fixJobClient On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80:0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable. at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957) at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) at org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125) at org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:322) ... 15 more Caused by: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka://flink/deadLetters), Path(/)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280) at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270) at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63) at
Re: Could not build up connection to JobManager
Hi Till, I’m sorry. It doesn’t seem to solve the problem. The taskmanager still tries a 10.0.0.0/8 IP. Best regards. On Mar 5, 2015, at 1:00 PM, Till Rohrmann till.rohrm...@gmail.com wrote: Hi Dulaj, I looked through your commit and noticed that the JobClient might not be listening on the right network interface. Your commit seems to fix it. I just want to understand the problem properly and therefore I opened a branch with a small change. Could you try out whether this change would also fix your problem? You can find the code here [1]. Would be awesome if you checked it out and let it run on your cluster setting. Thanks a lot Dulaj! [1] https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga vidura...@icloud.com wrote: The every change in the commit b7da22a is not required but I thought they are appropriate. On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, I found many other places “localhost” is hard coded. I changed them in a better way I think. I made a pull request. Please review. b7da22a https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd On Mar 4, 2015, at 8:17 PM, Stephan Ewen se...@apache.org wrote: If I recall correctly, we only hardcode localhost in the local mini cluster - do you think it is problematic there as well? Have you found any other places? On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga vidura...@icloud.com wrote: In some places of the code, localhost is hard coded. When it is resolved by the DNS, it is posible to be directed to a different IP other than 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to 127.0.0.1 and it works like a charm. But hard coding 127.0.0.1 is not a good option because when the jobmanager ip is changed, this becomes an issue again. I'm thinking of setting jobmanager ip from the config.yaml to these places. If you have a better idea on doing this with your experience, please let me know. Best.
Re: Could not build up connection to JobManager
Hi, The error message is, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80:0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable. at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957) at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) at org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125) at org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:322) ... 15 more Caused by: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka://flink/deadLetters), Path(/)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280) at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270) at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63) at org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321) at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952) ... 20 more The exception above occurred while trying to run your command. Client log doesn’t seem to show any info, 21:06:01,521 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21:06:01,935 INFO org.apache.flink.api.java.ExecutionEnvironment - The job has 0 registered types and 0 default Kryo serializers 21:06:02,857 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 21:06:02,909 INFO Remoting - Starting remoting 21:06:03,158 INFO Remoting
Re: Could not build up connection to JobManager
Glad I could help any way. :) When the address is set to “localhost” I cannot submit a job. It immediately fails. But the address is “127.0.0.1”, it is stuck a little whyle on DEPLOYING and the fails. Correct me if I’m wrong but I think since using the address, hardcoded in config file, won’t harm anything, it will be safer to use it rather than defining it in the code. On Mar 5, 2015, at 6:57 PM, Till Rohrmann trohrm...@apache.org wrote: Could you submit a job when you set the job manager address to localhost? I did not see any logging statements of received jobs. If you did, could you also send the logs of the client? The 0.0.0.0 to which the BlobServer binds works for me on my machine. I cannot remember that we had problems with that before. But I agree, we should set it to the network interface which the JobManager uses. I cannot explain why your fix solves the problem. It does not touch any of the JobClient/JobManager logic. I updated my local branch [1] with a fix for the BlobServer. Could you try it out again and send us the logs? Thanks a lot for your help Dulaj. On Thu, Mar 5, 2015 at 1:24 PM, Dulaj Viduranga vidura...@icloud.com wrote: But can you explain why did my fix solved it? On Mar 5, 2015, at 5:50 PM, Stephan Ewen se...@apache.org wrote: Hi Dulaj! Okay, the logs give us some insight. Both setups seem to look good in terms of TaskManager and JobManager startup. In one of the logs (127.0.0.1) you submit a job. The job fails because the TaskManager cannot grab the JAR file from the JobManager. I think the problem is that the BLOB server binds to 0.0.0.0 - it should bind to the same address as the JobManager actor system. That should definitely be changed... On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, This is the log with setting “localhost” flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log And this is the log with setting “127.0.0.1” flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log On Mar 5, 2015, at 2:23 PM, Till Rohrmann trohrm...@apache.org wrote: What does the jobmanager log says? I think Stephan added some more logging output which helps us to debug this problem. On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga vidura...@icloud.com wrote: Using start-locat.sh. I’m using the original config yaml. I also tried changing jobmanager address in config to “127.0.0.1 but no luck. With my changes it works ok. The conf file follows. # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # License); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an AS IS BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. #== # Common #== jobmanager.rpc.address: 127.0.0.1 jobmanager.rpc.port: 6123 jobmanager.heap.mb: 256 taskmanager.heap.mb: 512 taskmanager.numberOfTaskSlots: 1 parallelization.degree.default: 1 #== # Web Frontend #== # The port under which the web-based runtime monitor listens. # A value of -1 deactivates the web server. jobmanager.web.port: 8081 # The port uder which the standalone web client # (for job upload and submit) listens. webclient.port: 8080 #== # Advanced #== # The number of buffers for the network stack. # # taskmanager.network.numberOfBuffers: 2048 # Directories for temporary files. # # Add a delimited list for multiple directories, using the system directory # delimiter (colon ':' on unix) or a comma, e.g.: #
Re: Could not build up connection to JobManager
Hi Dulaj! Okay, the logs give us some insight. Both setups seem to look good in terms of TaskManager and JobManager startup. In one of the logs (127.0.0.1) you submit a job. The job fails because the TaskManager cannot grab the JAR file from the JobManager. I think the problem is that the BLOB server binds to 0.0.0.0 - it should bind to the same address as the JobManager actor system. That should definitely be changed... On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, This is the log with setting “localhost” flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log And this is the log with setting “127.0.0.1” flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log On Mar 5, 2015, at 2:23 PM, Till Rohrmann trohrm...@apache.org wrote: What does the jobmanager log says? I think Stephan added some more logging output which helps us to debug this problem. On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga vidura...@icloud.com wrote: Using start-locat.sh. I’m using the original config yaml. I also tried changing jobmanager address in config to “127.0.0.1 but no luck. With my changes it works ok. The conf file follows. # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # License); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an AS IS BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. #== # Common #== jobmanager.rpc.address: 127.0.0.1 jobmanager.rpc.port: 6123 jobmanager.heap.mb: 256 taskmanager.heap.mb: 512 taskmanager.numberOfTaskSlots: 1 parallelization.degree.default: 1 #== # Web Frontend #== # The port under which the web-based runtime monitor listens. # A value of -1 deactivates the web server. jobmanager.web.port: 8081 # The port uder which the standalone web client # (for job upload and submit) listens. webclient.port: 8080 #== # Advanced #== # The number of buffers for the network stack. # # taskmanager.network.numberOfBuffers: 2048 # Directories for temporary files. # # Add a delimited list for multiple directories, using the system directory # delimiter (colon ':' on unix) or a comma, e.g.: # /data1/tmp:/data2/tmp:/data3/tmp # # Note: Each directory entry is read from and written to by a different I/O # thread. You can include the same directory multiple times in order to create # multiple I/O threads against that directory. This is for example relevant for # high-throughput RAIDs. # # If not specified, the system-specific Java temporary directory (java.io.tmpdir # property) is taken. # # taskmanager.tmp.dirs: /tmp # Path to the Hadoop configuration directory. # # This configuration is used when writing into HDFS. Unless specified otherwise, # HDFS file creation will use HDFS default settings with respect to block-size, # replication factor, etc. # # You can also directly specify the paths to hdfs-default.xml and hdfs-site.xml # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'. # # fs.hdfs.hadoopconf: /path/to/hadoop/conf/ On Mar 5, 2015, at 2:03 PM, Till Rohrmann trohrm...@apache.org wrote: How did you start the flink cluster? Using the start-local.sh, the start-cluster.sh or starting the job manager and task managers individually using taskmanager.sh/jobmanager.sh. Could you maybe post the flink-conf.yaml file, you're using? With your changes, everything works, right? On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga
Re: Could not build up connection to JobManager
Could you submit a job when you set the job manager address to localhost? I did not see any logging statements of received jobs. If you did, could you also send the logs of the client? The 0.0.0.0 to which the BlobServer binds works for me on my machine. I cannot remember that we had problems with that before. But I agree, we should set it to the network interface which the JobManager uses. I cannot explain why your fix solves the problem. It does not touch any of the JobClient/JobManager logic. I updated my local branch [1] with a fix for the BlobServer. Could you try it out again and send us the logs? Thanks a lot for your help Dulaj. On Thu, Mar 5, 2015 at 1:24 PM, Dulaj Viduranga vidura...@icloud.com wrote: But can you explain why did my fix solved it? On Mar 5, 2015, at 5:50 PM, Stephan Ewen se...@apache.org wrote: Hi Dulaj! Okay, the logs give us some insight. Both setups seem to look good in terms of TaskManager and JobManager startup. In one of the logs (127.0.0.1) you submit a job. The job fails because the TaskManager cannot grab the JAR file from the JobManager. I think the problem is that the BLOB server binds to 0.0.0.0 - it should bind to the same address as the JobManager actor system. That should definitely be changed... On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, This is the log with setting “localhost” flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log And this is the log with setting “127.0.0.1” flink-Vidura-jobmanager-localhost.log https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log On Mar 5, 2015, at 2:23 PM, Till Rohrmann trohrm...@apache.org wrote: What does the jobmanager log says? I think Stephan added some more logging output which helps us to debug this problem. On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga vidura...@icloud.com wrote: Using start-locat.sh. I’m using the original config yaml. I also tried changing jobmanager address in config to “127.0.0.1 but no luck. With my changes it works ok. The conf file follows. # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # License); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an AS IS BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. #== # Common #== jobmanager.rpc.address: 127.0.0.1 jobmanager.rpc.port: 6123 jobmanager.heap.mb: 256 taskmanager.heap.mb: 512 taskmanager.numberOfTaskSlots: 1 parallelization.degree.default: 1 #== # Web Frontend #== # The port under which the web-based runtime monitor listens. # A value of -1 deactivates the web server. jobmanager.web.port: 8081 # The port uder which the standalone web client # (for job upload and submit) listens. webclient.port: 8080 #== # Advanced #== # The number of buffers for the network stack. # # taskmanager.network.numberOfBuffers: 2048 # Directories for temporary files. # # Add a delimited list for multiple directories, using the system directory # delimiter (colon ':' on unix) or a comma, e.g.: # /data1/tmp:/data2/tmp:/data3/tmp # # Note: Each directory entry is read from and written to by a different I/O # thread. You can include the same directory multiple times in order to create # multiple I/O threads against that directory. This is for example relevant for # high-throughput RAIDs. # # If not specified, the system-specific Java temporary directory (java.io.tmpdir # property) is taken. # #
Re: Could not build up connection to JobManager
Hi, I found many other places “localhost” is hard coded. I changed them in a better way I think. I made a pull request. Please review. b7da22a https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd On Mar 4, 2015, at 8:17 PM, Stephan Ewen se...@apache.org wrote: If I recall correctly, we only hardcode localhost in the local mini cluster - do you think it is problematic there as well? Have you found any other places? On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga vidura...@icloud.com wrote: In some places of the code, localhost is hard coded. When it is resolved by the DNS, it is posible to be directed to a different IP other than 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to 127.0.0.1 and it works like a charm. But hard coding 127.0.0.1 is not a good option because when the jobmanager ip is changed, this becomes an issue again. I'm thinking of setting jobmanager ip from the config.yaml to these places. If you have a better idea on doing this with your experience, please let me know. Best.
Re: Could not build up connection to JobManager
The every change in the commit b7da22a is not required but I thought they are appropriate. On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, I found many other places “localhost” is hard coded. I changed them in a better way I think. I made a pull request. Please review. b7da22a https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd On Mar 4, 2015, at 8:17 PM, Stephan Ewen se...@apache.org wrote: If I recall correctly, we only hardcode localhost in the local mini cluster - do you think it is problematic there as well? Have you found any other places? On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga vidura...@icloud.com wrote: In some places of the code, localhost is hard coded. When it is resolved by the DNS, it is posible to be directed to a different IP other than 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to 127.0.0.1 and it works like a charm. But hard coding 127.0.0.1 is not a good option because when the jobmanager ip is changed, this becomes an issue again. I'm thinking of setting jobmanager ip from the config.yaml to these places. If you have a better idea on doing this with your experience, please let me know. Best.
Re: Could not build up connection to JobManager
Hi, I found the fix for this issue and I'll create a pull request in the following day.
Re: Could not build up connection to JobManager
Calling: java -cp ../examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar org.apache.flink.examples.java.clustering.util.KMeansDataGenerator 500 10 0.08 Will not connect to Flink. Its just running a standalone KMeans data generator, not KMeans. I would suspect that the KMeans example is not running as well. You can run the KMeans example like this: bin/flink run ./examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar. On Sat, Feb 28, 2015 at 5:47 AM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, I’m thinking I’m doing something wrong. After setting jobManager address to 127.0.0.1, I can run kmeans example (java -cp ../examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar org.apache.flink.examples.java.clustering.util.KMeansDataGenerator 500 10 0.08) But I can’t run word count example (bin/flink run ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar file:'///Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt' file:'///Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count.txt’) I’m not sure whether I’m running it wrong On Feb 26, 2015, at 9:03 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, It’s great to help out. :) Setting 127.0.0.1 instead of “localhost” in jobmanager.rpc.address, helped to build the connection to the jobmanager. Apparently localhost resolving is different in webclient and the jobmanager. I think it’s good to set jobmanager.rpc.address: 127.0.0.1 in future builds. But then I get this error when I tried to run examples. I don’t know if I should move this issue to another thread. If so please tell me. bin/flink run /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt $FLINK_DIRECTORY/count 20:46:21,998 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 02/26/2015 20:46:23 Job execution switched to status RUNNING. 02/26/2015 20:46:23 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to SCHEDULED 02/26/2015 20:46:23 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to DEPLOYING 02/26/2015 20:48:03 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to FAILED akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/taskmanager#-1628133761]] after [10 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) at java.lang.Thread.run(Thread.java:745) 02/26/2015 20:48:03 Job execution switched to status FAILING. 02/26/2015 20:48:03 Reduce (SUM(1), at main(WordCount.java:72)(1/1) switched to CANCELED 02/26/2015 20:48:03 DataSink(CsvOutputFormat (path: /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count, delimiter: ))(1/1) switched to CANCELED 02/26/2015 20:48:03 Job execution switched to status FAILED. org.apache.flink.client.program.ProgramInvocationException: The program execution failed. at org.apache.flink.client.program.Client.run(Client.java:344) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
Re: Could not build up connection to JobManager
In some places of the code, localhost is hard coded. When it is resolved by the DNS, it is posible to be directed to a different IP other than 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to 127.0.0.1 and it works like a charm. But hard coding 127.0.0.1 is not a good option because when the jobmanager ip is changed, this becomes an issue again. I'm thinking of setting jobmanager ip from the config.yaml to these places. If you have a better idea on doing this with your experience, please let me know. Best.
Re: Could not build up connection to JobManager
Wow, great. Can you tell us what the issue was? Am 02.03.2015 09:31 schrieb Dulaj Viduranga vidura...@icloud.com: Hi, I found the fix for this issue and I'll create a pull request in the following day.
Re: Could not build up connection to JobManager
Here is the taskmanager log when I tried taskmanager.sh start flink-Vidura-taskmanager-localhost.log https://gist.github.com/anonymous/aef5a0bf8722feee9b97#file-flink-vidura-taskmanager-localhost-log On Feb 27, 2015, at 4:12 PM, Till Rohrmann trohrm...@apache.org wrote: It depends on how you started Flink. If you started a local cluster, then the TaskManager log is contained in the JobManager log we just don't see the respective log output in the snippet you posted. If you started a TaskManager independently, either by taskmanager.sh or by start-cluster.sh, then a file with the name format flink-user-taskmanager-hostname.log should be created in flink/log/. If the Flink directory is not shared by your cluster nodes, then you have to look on the machine on which you started the TaskManager. But since the JobManager binds to 127.0.0.1 I guess that you started a local cluster. Try whether you find some logging statements from the logger org.apache.flink.runtime.taskmanager.TaskManager in your log. Maybe you can upload the corresponding log file to [1] and post a link here. Greets, Till [1] https://gist.github.com/ On Thu, Feb 26, 2015 at 6:45 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Can you tell me where I can find TaskManager logs. I can’t find them in logs folder? I don’t suppose I should run taskmanager.sh as well. Right? I’m using a OS X Yosemite. I’ll send you my ifconfig. lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 16384 options=3RXCSUM,TXCSUM inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff00 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 nd6 options=1PERFORMNUD gif0: flags=8010POINTOPOINT,MULTICAST mtu 1280 stf0: flags=0 mtu 1280 en0: flags=8823UP,BROADCAST,SMART,SIMPLEX,MULTICAST mtu 1500 ether 60:03:08:a1:e0:f4 nd6 options=1PERFORMNUD media: autoselect (unknown type) status: inactive en1: flags=8963UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 options=60TSO4,TSO6 ether 72:00:02:32:14:d0 media: autoselect full-duplex status: inactive en2: flags=8963UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 options=60TSO4,TSO6 ether 72:00:02:32:14:d1 media: autoselect full-duplex status: inactive bridge0: flags=8822BROADCAST,SMART,SIMPLEX,MULTICAST mtu 1500 options=63RXCSUM,TXCSUM,TSO4,TSO6 ether 62:03:08:1a:fa:00 Configuration: id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0 maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200 root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0 ipfilter disabled flags 0x2 member: en1 flags=3LEARNING,DISCOVER ifmaxaddr 0 port 5 priority 0 path cost 0 member: en2 flags=3LEARNING,DISCOVER ifmaxaddr 0 port 6 priority 0 path cost 0 media: unknown type status: inactive p2p0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 2304 ether 02:03:08:a1:e0:f4 media: autoselect status: inactive awdl0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 1452 ether 06:56:3d:f6:60:08 nd6 options=1PERFORMNUD media: autoselect status: inactive ppp0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1500 inet 10.218.98.228 -- 10.64.64.64 netmask 0xff00 utun0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1380 inet6 fe80::b0d4:d4be:7e62:e730%utun0 prefixlen 64 scopeid 0xb inet6 fdd0:b291:7da7:9153:b0d4:d4be:7e62:e730 prefixlen 64 nd6 options=1PERFORMNUD On Feb 26, 2015, at 10:48 PM, Stephan Ewen se...@apache.org wrote: Hi Dulaj! Thanks for helping to debug. My guess is that you are seeing now the same thing between JobManager and TaskManager as you saw before between JobManager and JobClient. I have a patch pending that should help the issue (see https://issues.apache.org/jira/browse/FLINK-1608), let's see if that solves it. What seems not right is that the JobManager initially accepted the TaskManager and later the communication. Can you paste the TaskManager log as well? Also: There must be something fairly unique about your network configuration, as it works on all other setups that we use (locally, cloud, test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any chance? Greetings, Stephan On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, It’s great to help out. :) Setting 127.0.0.1 instead of “localhost” in jobmanager.rpc.address, helped to build the connection to the jobmanager. Apparently localhost resolving is different in webclient and the jobmanager. I think it’s good to set jobmanager.rpc.address: 127.0.0.1 in future builds. But then I get this error when I tried to run examples. I don’t know if I
Re: Could not build up connection to JobManager
It depends on how you started Flink. If you started a local cluster, then the TaskManager log is contained in the JobManager log we just don't see the respective log output in the snippet you posted. If you started a TaskManager independently, either by taskmanager.sh or by start-cluster.sh, then a file with the name format flink-user-taskmanager-hostname.log should be created in flink/log/. If the Flink directory is not shared by your cluster nodes, then you have to look on the machine on which you started the TaskManager. But since the JobManager binds to 127.0.0.1 I guess that you started a local cluster. Try whether you find some logging statements from the logger org.apache.flink.runtime.taskmanager.TaskManager in your log. Maybe you can upload the corresponding log file to [1] and post a link here. Greets, Till [1] https://gist.github.com/ On Thu, Feb 26, 2015 at 6:45 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Can you tell me where I can find TaskManager logs. I can’t find them in logs folder? I don’t suppose I should run taskmanager.sh as well. Right? I’m using a OS X Yosemite. I’ll send you my ifconfig. lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 16384 options=3RXCSUM,TXCSUM inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff00 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 nd6 options=1PERFORMNUD gif0: flags=8010POINTOPOINT,MULTICAST mtu 1280 stf0: flags=0 mtu 1280 en0: flags=8823UP,BROADCAST,SMART,SIMPLEX,MULTICAST mtu 1500 ether 60:03:08:a1:e0:f4 nd6 options=1PERFORMNUD media: autoselect (unknown type) status: inactive en1: flags=8963UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 options=60TSO4,TSO6 ether 72:00:02:32:14:d0 media: autoselect full-duplex status: inactive en2: flags=8963UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 options=60TSO4,TSO6 ether 72:00:02:32:14:d1 media: autoselect full-duplex status: inactive bridge0: flags=8822BROADCAST,SMART,SIMPLEX,MULTICAST mtu 1500 options=63RXCSUM,TXCSUM,TSO4,TSO6 ether 62:03:08:1a:fa:00 Configuration: id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0 maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200 root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0 ipfilter disabled flags 0x2 member: en1 flags=3LEARNING,DISCOVER ifmaxaddr 0 port 5 priority 0 path cost 0 member: en2 flags=3LEARNING,DISCOVER ifmaxaddr 0 port 6 priority 0 path cost 0 media: unknown type status: inactive p2p0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 2304 ether 02:03:08:a1:e0:f4 media: autoselect status: inactive awdl0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 1452 ether 06:56:3d:f6:60:08 nd6 options=1PERFORMNUD media: autoselect status: inactive ppp0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1500 inet 10.218.98.228 -- 10.64.64.64 netmask 0xff00 utun0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1380 inet6 fe80::b0d4:d4be:7e62:e730%utun0 prefixlen 64 scopeid 0xb inet6 fdd0:b291:7da7:9153:b0d4:d4be:7e62:e730 prefixlen 64 nd6 options=1PERFORMNUD On Feb 26, 2015, at 10:48 PM, Stephan Ewen se...@apache.org wrote: Hi Dulaj! Thanks for helping to debug. My guess is that you are seeing now the same thing between JobManager and TaskManager as you saw before between JobManager and JobClient. I have a patch pending that should help the issue (see https://issues.apache.org/jira/browse/FLINK-1608), let's see if that solves it. What seems not right is that the JobManager initially accepted the TaskManager and later the communication. Can you paste the TaskManager log as well? Also: There must be something fairly unique about your network configuration, as it works on all other setups that we use (locally, cloud, test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any chance? Greetings, Stephan On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, It’s great to help out. :) Setting 127.0.0.1 instead of “localhost” in jobmanager.rpc.address, helped to build the connection to the jobmanager. Apparently localhost resolving is different in webclient and the jobmanager. I think it’s good to set jobmanager.rpc.address: 127.0.0.1 in future builds. But then I get this error when I tried to run examples. I don’t know if I should move this issue to another thread. If so please tell me. bin/flink run
Re: Could not build up connection to JobManager
Hi, It’s great to help out. :) Setting 127.0.0.1 instead of “localhost” in jobmanager.rpc.address, helped to build the connection to the jobmanager. Apparently localhost resolving is different in webclient and the jobmanager. I think it’s good to set jobmanager.rpc.address: 127.0.0.1 in future builds. But then I get this error when I tried to run examples. I don’t know if I should move this issue to another thread. If so please tell me. bin/flink run /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt $FLINK_DIRECTORY/count 20:46:21,998 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 02/26/2015 20:46:23 Job execution switched to status RUNNING. 02/26/2015 20:46:23 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to SCHEDULED 02/26/2015 20:46:23 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to DEPLOYING 02/26/2015 20:48:03 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to FAILED akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/taskmanager#-1628133761]] after [10 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) at java.lang.Thread.run(Thread.java:745) 02/26/2015 20:48:03 Job execution switched to status FAILING. 02/26/2015 20:48:03 Reduce (SUM(1), at main(WordCount.java:72)(1/1) switched to CANCELED 02/26/2015 20:48:03 DataSink(CsvOutputFormat (path: /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count, delimiter: ))(1/1) switched to CANCELED 02/26/2015 20:48:03 Job execution switched to status FAILED. org.apache.flink.client.program.ProgramInvocationException: The program execution failed. at org.apache.flink.client.program.Client.run(Client.java:344) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:284) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at
Re: Could not build up connection to JobManager
Hi Dulaj! Thanks for helping to debug. My guess is that you are seeing now the same thing between JobManager and TaskManager as you saw before between JobManager and JobClient. I have a patch pending that should help the issue (see https://issues.apache.org/jira/browse/FLINK-1608), let's see if that solves it. What seems not right is that the JobManager initially accepted the TaskManager and later the communication. Can you paste the TaskManager log as well? Also: There must be something fairly unique about your network configuration, as it works on all other setups that we use (locally, cloud, test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any chance? Greetings, Stephan On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, It’s great to help out. :) Setting 127.0.0.1 instead of “localhost” in jobmanager.rpc.address, helped to build the connection to the jobmanager. Apparently localhost resolving is different in webclient and the jobmanager. I think it’s good to set jobmanager.rpc.address: 127.0.0.1 in future builds. But then I get this error when I tried to run examples. I don’t know if I should move this issue to another thread. If so please tell me. bin/flink run /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt $FLINK_DIRECTORY/count 20:46:21,998 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 02/26/2015 20:46:23 Job execution switched to status RUNNING. 02/26/2015 20:46:23 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to SCHEDULED 02/26/2015 20:46:23 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to DEPLOYING 02/26/2015 20:48:03 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to FAILED akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/taskmanager#-1628133761]] after [10 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) at java.lang.Thread.run(Thread.java:745) 02/26/2015 20:48:03 Job execution switched to status FAILING. 02/26/2015 20:48:03 Reduce (SUM(1), at main(WordCount.java:72)(1/1) switched to CANCELED 02/26/2015 20:48:03 DataSink(CsvOutputFormat (path: /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count, delimiter: ))(1/1) switched to CANCELED 02/26/2015 20:48:03 Job execution switched to status FAILED. org.apache.flink.client.program.ProgramInvocationException: The program execution failed. at org.apache.flink.client.program.Client.run(Client.java:344) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at
Re: Could not build up connection to JobManager
Hi, Sorry for the delay to reply on this issue. the jobmanager.rpc.address is set to “localhost” already in conf.yaml. This can’t be an issue because the job manager web interface works fine which also runs on localhost bin/flink run jar doesn’t seem to work either. Let me send you my command and the result in terminal. bin/flink run /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt $FLINK_DIRECTORY/count 20:32:16,442 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: java.io.IOException: JobManager at akka.tcp://flink@10.216.177.146:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable. at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897) at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) at org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125) at org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:322) ... 15 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [1 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893) ... 20 more The exception above occurred while trying to run your command. On Feb 25, 2015, at 1:29 AM, Stephan Ewen se...@apache.org wrote: BTW: Does still work if you enter localhost for jobmanager.rpc.address in your flink-conf.yaml ? On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen se...@apache.org wrote: Hi! I think that this is a problem in the current master (probably in there since a few days ago). I am fixing it... Thanks for reporting it! Stephan On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen se...@apache.org wrote: Hi Dulaj! The log suggests that the JobManager binds itself to the IP address 10.216.192.98 and the WebClient runs at 127.0.0.1 The 127.0.0.1 actor system cannot connect to the 10.216.192.98. Let me verify whether this is a quirk of your particular setup, or a bug recently introduces in the 0.9-SNAPSHOT. Does the command line work for you? (bin/flink run jar) taskmanager.numberOfTaskSlots: -1 is also okay, this will mean that the default of '1' is used. Greetings, Stephan On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga vidura...@icloud.com wrote: Is taskmanager.numberOfTaskSlots: -1 normal? On Feb 24, 2015, at 9:44 PM, Robert Metzger rmetz...@apache.org wrote: Hi, I could not find the logfiles attached to your mails. I think
Re: Could not build up connection to JobManager
Addition: To check whether a port is reachable, I think the easiest thing is to try and connect with a telnet client and see if the connection is refused. On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen se...@apache.org wrote: Okay, the problem seems to be that even though both the client and the jobmanager use localhost as the host name, they resolve this to different IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146 Also, the 127.0.0.1 address cannot communicate to 10.216.177.146 apparently. Can you help us debug this by checking the following: - Can you try and set jobmanager.rpc.address to 127.0.0.1 and see if that solves it? - Can you try and set jobmanager.rpc.address to the other address (10.216.177.146 or so) and see if that solves it? - Can you do start-cluster.sh, rather than start-local.sh and see whether the webfrontend displays that the TaskManager connects? - As a hard core test: Can you bring up the jobmanager, check where it connects (10.216.192.98:6123 or so) and see whether the port is reachable? We have recently updated how the Akka URLs are build, to work around a limitation in Akka. Seems that did not yet fully solve the issue. Thanks for helping us debug this, it is not the easiest immigration experience, but the outcome is probably extremely valuable for the project :-) Greetings, Stephan On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Sorry for the delay to reply on this issue. the jobmanager.rpc.address is set to “localhost” already in conf.yaml. This can’t be an issue because the job manager web interface works fine which also runs on localhost bin/flink run jar doesn’t seem to work either. Let me send you my command and the result in terminal. bin/flink run /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt $FLINK_DIRECTORY/count 20:32:16,442 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: java.io.IOException: JobManager at akka.tcp:// flink@10.216.177.146:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable. at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897) at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) at org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125) at org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:322) ... 15 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [1 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at
Re: Could not build up connection to JobManager
Okay, the problem seems to be that even though both the client and the jobmanager use localhost as the host name, they resolve this to different IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146 Also, the 127.0.0.1 address cannot communicate to 10.216.177.146 apparently. Can you help us debug this by checking the following: - Can you try and set jobmanager.rpc.address to 127.0.0.1 and see if that solves it? - Can you try and set jobmanager.rpc.address to the other address (10.216.177.146 or so) and see if that solves it? - Can you do start-cluster.sh, rather than start-local.sh and see whether the webfrontend displays that the TaskManager connects? - As a hard core test: Can you bring up the jobmanager, check where it connects (10.216.192.98:6123 or so) and see whether the port is reachable? We have recently updated how the Akka URLs are build, to work around a limitation in Akka. Seems that did not yet fully solve the issue. Thanks for helping us debug this, it is not the easiest immigration experience, but the outcome is probably extremely valuable for the project :-) Greetings, Stephan On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga vidura...@icloud.com wrote: Hi, Sorry for the delay to reply on this issue. the jobmanager.rpc.address is set to “localhost” already in conf.yaml. This can’t be an issue because the job manager web interface works fine which also runs on localhost bin/flink run jar doesn’t seem to work either. Let me send you my command and the result in terminal. bin/flink run /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt $FLINK_DIRECTORY/count 20:32:16,442 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager. at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.program.Client.run(Client.java:306) at org.apache.flink.client.program.Client.run(Client.java:300) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:250) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114) Caused by: java.io.IOException: JobManager at akka.tcp:// flink@10.216.177.146:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable. at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897) at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151) at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142) at org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125) at org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala) at org.apache.flink.client.program.Client.run(Client.java:322) ... 15 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [1 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893) ... 20 more The exception above occurred while trying to run your command. On Feb 25, 2015,
Re: Could not build up connection to JobManager
BTW: Does still work if you enter localhost for jobmanager.rpc.address in your flink-conf.yaml ? On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen se...@apache.org wrote: Hi! I think that this is a problem in the current master (probably in there since a few days ago). I am fixing it... Thanks for reporting it! Stephan On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen se...@apache.org wrote: Hi Dulaj! The log suggests that the JobManager binds itself to the IP address 10.216.192.98 and the WebClient runs at 127.0.0.1 The 127.0.0.1 actor system cannot connect to the 10.216.192.98. Let me verify whether this is a quirk of your particular setup, or a bug recently introduces in the 0.9-SNAPSHOT. Does the command line work for you? (bin/flink run jar) taskmanager.numberOfTaskSlots: -1 is also okay, this will mean that the default of '1' is used. Greetings, Stephan On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga vidura...@icloud.com wrote: Is taskmanager.numberOfTaskSlots: -1 normal? On Feb 24, 2015, at 9:44 PM, Robert Metzger rmetz...@apache.org wrote: Hi, I could not find the logfiles attached to your mails. I think the mailinglists are not accepting attachments. Can you put the logs on gist.github.com? The configuration values are documented here: http://flink.apache.org/docs/0.8/config.html For the webclient's port its called webclient.port On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga vidura...@icloud.com wrote: I tried to kill the job manager manually in the terminal and start it again but no luck. Also could you tell me if it’s possible to change webclient’s port (8080) ? On Feb 24, 2015, at 1:41 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! As a contributor, I would go against the latest version, which is 0.9-SNAPSHOT. It may be in your case that the JobManager actor is down, but the process still lingers. (BTW: I have a patch pending that makes sure the process disappears when the actor via down). Could you have a look at the log flink-user-jobmanager-host-.log and see if there are any errors logged? Greetings, Stephan Am 24.02.2015 06:29 schrieb Dulaj Viduranga vidura...@icloud.com : The JobManager seems to run fine. I don't know. When I tried to run start-local.sh again, It shows the PID of the running JobManager and also :8081 runs fine. I want to contribute to the project and I could get a little boost if I could see the capabilities of FLINK. :) Will it be OK to use 0.8.1 as a developer? On Feb 24, 2015, at 04:15 AM, Stephan Ewen se...@apache.org wrote: Hi Dulaj, That error message indicates that the JobManager is not running. Are you sure that the JobManager runs properly? Anything in the JobManager logs? BTW: The 0.9 branch is under heavy development / changes. That is why it may behave a bit different on different days right now. I would recommend to use the 0.8.1 release for a stable experience. Greetings, Stephan On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger rmetz...@apache.org wrote: Thank you for the quick reply. The log you've send is from the webclient. Can you also send the log of the JobManager? On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga vidura...@icloud.com wrote: Yes. It seams it is not a problem with the arguments. I tried two days but different error occurs. It seams the web client can’t connect to the job manager although it is running Right now, I can’t even get the webclient to run. ./bin/start-webclient.sh executes fine but I cannot connect to localhost:8080 (even with telnet or curl) Here is the log for jobManager 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer - Setting up web frontend server, using web-root directory 'jar: file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs '. 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer - Web frontend server will store temporary files in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T', uploaded jobs in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-jobs', plan-json-dumps in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-plans'. 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer - Web-frontend will submit jobs to nephele job-manager on localhost, port 6123. 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 23:22:32,625 INFO Remoting - Starting remoting 23:22:32,838 INFO Remoting - Remoting started; listening on addresses :[akka.tcp:// JobsInfoServletActorSystem@127.0.0.1:51517] 23:23:48,119 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://
Re: Could not build up connection to JobManager
Hi, I could not find the logfiles attached to your mails. I think the mailinglists are not accepting attachments. Can you put the logs on gist.github.com? The configuration values are documented here: http://flink.apache.org/docs/0.8/config.html For the webclient's port its called webclient.port On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga vidura...@icloud.com wrote: I tried to kill the job manager manually in the terminal and start it again but no luck. Also could you tell me if it’s possible to change webclient’s port (8080) ? On Feb 24, 2015, at 1:41 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! As a contributor, I would go against the latest version, which is 0.9-SNAPSHOT. It may be in your case that the JobManager actor is down, but the process still lingers. (BTW: I have a patch pending that makes sure the process disappears when the actor via down). Could you have a look at the log flink-user-jobmanager-host-.log and see if there are any errors logged? Greetings, Stephan Am 24.02.2015 06:29 schrieb Dulaj Viduranga vidura...@icloud.com: The JobManager seems to run fine. I don't know. When I tried to run start-local.sh again, It shows the PID of the running JobManager and also :8081 runs fine. I want to contribute to the project and I could get a little boost if I could see the capabilities of FLINK. :) Will it be OK to use 0.8.1 as a developer? On Feb 24, 2015, at 04:15 AM, Stephan Ewen se...@apache.org wrote: Hi Dulaj, That error message indicates that the JobManager is not running. Are you sure that the JobManager runs properly? Anything in the JobManager logs? BTW: The 0.9 branch is under heavy development / changes. That is why it may behave a bit different on different days right now. I would recommend to use the 0.8.1 release for a stable experience. Greetings, Stephan On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger rmetz...@apache.org wrote: Thank you for the quick reply. The log you've send is from the webclient. Can you also send the log of the JobManager? On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga vidura...@icloud.com wrote: Yes. It seams it is not a problem with the arguments. I tried two days but different error occurs. It seams the web client can’t connect to the job manager although it is running Right now, I can’t even get the webclient to run. ./bin/start-webclient.sh executes fine but I cannot connect to localhost:8080 (even with telnet or curl) Here is the log for jobManager 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer - Setting up web frontend server, using web-root directory 'jar: file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs '. 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer - Web frontend server will store temporary files in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T', uploaded jobs in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-jobs', plan-json-dumps in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-plans'. 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer - Web-frontend will submit jobs to nephele job-manager on localhost, port 6123. 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 23:22:32,625 INFO Remoting - Starting remoting 23:22:32,838 INFO Remoting - Remoting started; listening on addresses :[akka.tcp:// JobsInfoServletActorSystem@127.0.0.1:51517] 23:23:48,119 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp:// flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Operation timed out: /10.218.98.169:6123 23:23:48,124 ERROR org.apache.flink.client.WebFrontend - Unexpected exception: Could not find job manager at specified address akka.flink@10.218.98.169:6123/user/jobmanager'tcp:// flink@10.218.98.169:6123/user/jobmanager. java.lang.RuntimeException: Could not find job manager at specified address akka.flink@10.218.98.169:6123/user/jobmanager'tcp:// flink@10.218.98.169:6123/user/jobmanager. at org.apache.flink.client.web.JobsInfoServlet.init(JobsInfoServlet.java:82) at org.apache.flink.client.web.WebInterfaceServer.init(WebInterfaceServer.java:158) at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74) On Feb 23, 2015, at 11:46 PM, Robert Metzger rmetz...@apache.org wrote: Hi, you said in the other email thread that the error only occurs for Wordcount, not for Kmeans. Can you copy me the commands for both examples? I can not really believe that there is a difference
Re: Could not build up connection to JobManager
Is taskmanager.numberOfTaskSlots: -1 normal? On Feb 24, 2015, at 9:44 PM, Robert Metzger rmetz...@apache.org wrote: Hi, I could not find the logfiles attached to your mails. I think the mailinglists are not accepting attachments. Can you put the logs on gist.github.com? The configuration values are documented here: http://flink.apache.org/docs/0.8/config.html For the webclient's port its called webclient.port On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga vidura...@icloud.com wrote: I tried to kill the job manager manually in the terminal and start it again but no luck. Also could you tell me if it’s possible to change webclient’s port (8080) ? On Feb 24, 2015, at 1:41 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! As a contributor, I would go against the latest version, which is 0.9-SNAPSHOT. It may be in your case that the JobManager actor is down, but the process still lingers. (BTW: I have a patch pending that makes sure the process disappears when the actor via down). Could you have a look at the log flink-user-jobmanager-host-.log and see if there are any errors logged? Greetings, Stephan Am 24.02.2015 06:29 schrieb Dulaj Viduranga vidura...@icloud.com: The JobManager seems to run fine. I don't know. When I tried to run start-local.sh again, It shows the PID of the running JobManager and also :8081 runs fine. I want to contribute to the project and I could get a little boost if I could see the capabilities of FLINK. :) Will it be OK to use 0.8.1 as a developer? On Feb 24, 2015, at 04:15 AM, Stephan Ewen se...@apache.org wrote: Hi Dulaj, That error message indicates that the JobManager is not running. Are you sure that the JobManager runs properly? Anything in the JobManager logs? BTW: The 0.9 branch is under heavy development / changes. That is why it may behave a bit different on different days right now. I would recommend to use the 0.8.1 release for a stable experience. Greetings, Stephan On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger rmetz...@apache.org wrote: Thank you for the quick reply. The log you've send is from the webclient. Can you also send the log of the JobManager? On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga vidura...@icloud.com wrote: Yes. It seams it is not a problem with the arguments. I tried two days but different error occurs. It seams the web client can’t connect to the job manager although it is running Right now, I can’t even get the webclient to run. ./bin/start-webclient.sh executes fine but I cannot connect to localhost:8080 (even with telnet or curl) Here is the log for jobManager 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer - Setting up web frontend server, using web-root directory 'jar: file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs '. 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer - Web frontend server will store temporary files in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T', uploaded jobs in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-jobs', plan-json-dumps in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-plans'. 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer - Web-frontend will submit jobs to nephele job-manager on localhost, port 6123. 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 23:22:32,625 INFO Remoting - Starting remoting 23:22:32,838 INFO Remoting - Remoting started; listening on addresses :[akka.tcp:// JobsInfoServletActorSystem@127.0.0.1:51517] 23:23:48,119 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp:// flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Operation timed out: /10.218.98.169:6123 23:23:48,124 ERROR org.apache.flink.client.WebFrontend - Unexpected exception: Could not find job manager at specified address akka.flink@10.218.98.169:6123/user/jobmanager'tcp:// flink@10.218.98.169:6123/user/jobmanager. java.lang.RuntimeException: Could not find job manager at specified address akka.flink@10.218.98.169:6123/user/jobmanager'tcp:// flink@10.218.98.169:6123/user/jobmanager. at org.apache.flink.client.web.JobsInfoServlet.init(JobsInfoServlet.java:82) at org.apache.flink.client.web.WebInterfaceServer.init(WebInterfaceServer.java:158) at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74) On Feb 23, 2015, at 11:46 PM, Robert Metzger rmetz...@apache.org wrote: Hi, you said in the other email thread that the error only occurs for Wordcount, not for Kmeans. Can you copy me the commands for both examples? I can
Re: Could not build up connection to JobManager
Hi! I think that this is a problem in the current master (probably in there since a few days ago). I am fixing it... Thanks for reporting it! Stephan On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen se...@apache.org wrote: Hi Dulaj! The log suggests that the JobManager binds itself to the IP address 10.216.192.98 and the WebClient runs at 127.0.0.1 The 127.0.0.1 actor system cannot connect to the 10.216.192.98. Let me verify whether this is a quirk of your particular setup, or a bug recently introduces in the 0.9-SNAPSHOT. Does the command line work for you? (bin/flink run jar) taskmanager.numberOfTaskSlots: -1 is also okay, this will mean that the default of '1' is used. Greetings, Stephan On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga vidura...@icloud.com wrote: Is taskmanager.numberOfTaskSlots: -1 normal? On Feb 24, 2015, at 9:44 PM, Robert Metzger rmetz...@apache.org wrote: Hi, I could not find the logfiles attached to your mails. I think the mailinglists are not accepting attachments. Can you put the logs on gist.github.com? The configuration values are documented here: http://flink.apache.org/docs/0.8/config.html For the webclient's port its called webclient.port On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga vidura...@icloud.com wrote: I tried to kill the job manager manually in the terminal and start it again but no luck. Also could you tell me if it’s possible to change webclient’s port (8080) ? On Feb 24, 2015, at 1:41 PM, Stephan Ewen se...@apache.org wrote: Hey Dulaj! As a contributor, I would go against the latest version, which is 0.9-SNAPSHOT. It may be in your case that the JobManager actor is down, but the process still lingers. (BTW: I have a patch pending that makes sure the process disappears when the actor via down). Could you have a look at the log flink-user-jobmanager-host-.log and see if there are any errors logged? Greetings, Stephan Am 24.02.2015 06:29 schrieb Dulaj Viduranga vidura...@icloud.com: The JobManager seems to run fine. I don't know. When I tried to run start-local.sh again, It shows the PID of the running JobManager and also :8081 runs fine. I want to contribute to the project and I could get a little boost if I could see the capabilities of FLINK. :) Will it be OK to use 0.8.1 as a developer? On Feb 24, 2015, at 04:15 AM, Stephan Ewen se...@apache.org wrote: Hi Dulaj, That error message indicates that the JobManager is not running. Are you sure that the JobManager runs properly? Anything in the JobManager logs? BTW: The 0.9 branch is under heavy development / changes. That is why it may behave a bit different on different days right now. I would recommend to use the 0.8.1 release for a stable experience. Greetings, Stephan On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger rmetz...@apache.org wrote: Thank you for the quick reply. The log you've send is from the webclient. Can you also send the log of the JobManager? On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga vidura...@icloud.com wrote: Yes. It seams it is not a problem with the arguments. I tried two days but different error occurs. It seams the web client can’t connect to the job manager although it is running Right now, I can’t even get the webclient to run. ./bin/start-webclient.sh executes fine but I cannot connect to localhost:8080 (even with telnet or curl) Here is the log for jobManager 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer - Setting up web frontend server, using web-root directory 'jar: file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs '. 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer - Web frontend server will store temporary files in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T', uploaded jobs in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-jobs', plan-json-dumps in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-plans'. 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer - Web-frontend will submit jobs to nephele job-manager on localhost, port 6123. 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 23:22:32,625 INFO Remoting - Starting remoting 23:22:32,838 INFO Remoting - Remoting started; listening on addresses :[akka.tcp:// JobsInfoServletActorSystem@127.0.0.1:51517] 23:23:48,119 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp:// flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Operation timed out: /10.218.98.169:6123
Re: Could not build up connection to JobManager
Hi, you said in the other email thread that the error only occurs for Wordcount, not for Kmeans. Can you copy me the commands for both examples? I can not really believe that there is a difference between the two jobs. Can you also send us the contents of the jobmanager log file? Best, Robert On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga vidura...@icloud.com wrote: I’m getting Could not build up connection to JobManager.” When i tried to run the wordCount example. Can anyone help? Dulaj