Re: Could not build up connection to JobManager

2015-03-16 Thread Till Rohrmann
It is really strange. It's right that the CliFrontend now resolves
localhost to the correct local address 10.218.100.122. Moreover, according
to the logs, the JobManager is also started and binds to akka.tcp://
flink@10.218.100.122:6123. According to the logs, this is also the address
the CliFrontend uses to connect to the JobManager. If the timestamps are
correct, then the JobManager was still alive when the job was sent. I don't
really understand why this happens. Can it be that the CliFrontend which
binds to 127.0.0.1 cannot communicate with 10.218.100.122? Can it be that
you have some settings which prevent this? For the failing 127.0.0.1 case,
it would be helpful to have access to the JobManager log.

I've updated the branch
https://github.com/tillrohrmann/flink/tree/fixJobClient with a new fix for
the localhost scenario. Could you try it out again? Thanks a lot for your
help.

Best regards,

Till

On Mon, Mar 16, 2015 at 10:30 AM, Ufuk Celebi u...@apache.org wrote:

 There was an issue for this:
 https://issues.apache.org/jira/browse/FLINK-1634

 Can we close it then?

 On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:

  Hay Stephan,
  Great to know you could fix the issue. Thank you on the update.
  Best regards.
 
   On Mar 14, 2015, at 9:19 PM, Stephan Ewen se...@apache.org wrote:
  
   Hey Dulaj!
  
   Forget what I said in the previous email. The issue with the wrong
  address
   binding seems to be solved now. There is another issue that the
 embedded
   taskmanager does not start properly, for whatever reason. My gut
 feeling
  is
   that there is something wrong
  
   There is a patch pending that changes the startup behavior to debug
 these
   situations much easier. I'll ping you as soon as that is in...
  
  
   Stephan
  
   On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org
 wrote:
  
   Hey Dulaj!
  
   One thing you can try is to add to the JVM startup options (in the
  scripts
   in the bin folder) the option -Djava.net.preferIPv4Stack=true and
  see
   if that helps it?
  
   Stephan
  
  
   On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga 
 vidura...@icloud.com
   wrote:
  
   Hi,
   Still this is no luck. I’ll upload the logs with configuration
   “localhost as well as “127.0.0.1” so you can take a look.
  
   127.0.0.1
   flink-Vidura-flink-client-localhost.log 
  
 
 https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
  
  
   localhost
   flink-Vidura-flink-client-localhost.log 
  
 
 https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
  
   flink-Vidura-jobmanager-localhost.log 
  
 
 https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
  
  
   On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org
   wrote:
  
   Hi Dulaj,
  
   sorry for my late response. It looks as if the JobClient tries to
   connect
   to the JobManager using its IPv6 instead of IPv4. Akka is really
 picky
   when
   it comes to remote address. If Akka binds to the FQDN, then other
   ActorSystem which try to connect to it using its IP address won't be
   successful. I assume that this might be a problem. I tried to fix
 it.
   You
   can find it here [1]. Could you please try it out by starting a
 local
   cluster with the start-local.sh script. If it fails, could you
 please
   send
   me all log files (client, jobmanager and taskmanager). Once we
 figured
   out
   why the JobCilent does not connect, we can try to tackle the
  BlobServer
   issue.
  
   Cheers,
  
   Till
  
   [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
  
   On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga 
 vidura...@icloud.com
  
   wrote:
  
   Hi,
   The error message is,
  
   21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform...
  using
   builtin-java classes where applicable
   org.apache.flink.client.program.ProgramInvocationException: Could
 not
   build up connection to JobManager.
 at
 org.apache.flink.client.program.Client.run(Client.java:327)
 at
 org.apache.flink.client.program.Client.run(Client.java:306)
 at
 org.apache.flink.client.program.Client.run(Client.java:300)
 at
  
  
 
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
 at
  
  
 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)
 at
  
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
  
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at
  
  
 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)

Re: Could not build up connection to JobManager

2015-03-16 Thread Till Rohrmann
Could you please upload the logs? They would be really helpful.

On Mon, Mar 16, 2015 at 6:11 PM, Dulaj Viduranga vidura...@icloud.com
wrote:

 Hi,
 I tested the update but it’s still the same. I think it isn’t a problem
 with my system because, I have an XAMPP server working totally fine (I
 tried with it is shut down as well) and also I doubly checked hosts files.
 I had little snitch installed but I also tried uninstalling it.
 Isn’t there a way around without using DNS to resolve localhost?

  On Mar 16, 2015, at 10:04 PM, Till Rohrmann trohrm...@apache.org
 wrote:
 
  It is really strange. It's right that the CliFrontend now resolves
  localhost to the correct local address 10.218.100.122. Moreover,
 according
  to the logs, the JobManager is also started and binds to akka.tcp://
  flink@10.218.100.122:6123. According to the logs, this is also the
 address
  the CliFrontend uses to connect to the JobManager. If the timestamps are
  correct, then the JobManager was still alive when the job was sent. I
 don't
  really understand why this happens. Can it be that the CliFrontend which
  binds to 127.0.0.1 cannot communicate with 10.218.100.122? Can it be that
  you have some settings which prevent this? For the failing 127.0.0.1
 case,
  it would be helpful to have access to the JobManager log.
 
  I've updated the branch
  https://github.com/tillrohrmann/flink/tree/fixJobClient with a new fix
 for
  the localhost scenario. Could you try it out again? Thanks a lot for
 your
  help.
 
  Best regards,
 
  Till
 
  On Mon, Mar 16, 2015 at 10:30 AM, Ufuk Celebi u...@apache.org wrote:
 
  There was an issue for this:
  https://issues.apache.org/jira/browse/FLINK-1634
 
  Can we close it then?
 
  On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga vidura...@icloud.com
  wrote:
 
  Hay Stephan,
  Great to know you could fix the issue. Thank you on the update.
  Best regards.
 
  On Mar 14, 2015, at 9:19 PM, Stephan Ewen se...@apache.org wrote:
 
  Hey Dulaj!
 
  Forget what I said in the previous email. The issue with the wrong
  address
  binding seems to be solved now. There is another issue that the
  embedded
  taskmanager does not start properly, for whatever reason. My gut
  feeling
  is
  that there is something wrong
 
  There is a patch pending that changes the startup behavior to debug
  these
  situations much easier. I'll ping you as soon as that is in...
 
 
  Stephan
 
  On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org
  wrote:
 
  Hey Dulaj!
 
  One thing you can try is to add to the JVM startup options (in the
  scripts
  in the bin folder) the option -Djava.net.preferIPv4Stack=true and
  see
  if that helps it?
 
  Stephan
 
 
  On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga 
  vidura...@icloud.com
  wrote:
 
  Hi,
  Still this is no luck. I’ll upload the logs with configuration
  “localhost as well as “127.0.0.1” so you can take a look.
 
  127.0.0.1
  flink-Vidura-flink-client-localhost.log 
 
 
 
 https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
 
 
  localhost
  flink-Vidura-flink-client-localhost.log 
 
 
 
 https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
 
  flink-Vidura-jobmanager-localhost.log 
 
 
 
 https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
 
 
  On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org
  wrote:
 
  Hi Dulaj,
 
  sorry for my late response. It looks as if the JobClient tries to
  connect
  to the JobManager using its IPv6 instead of IPv4. Akka is really
  picky
  when
  it comes to remote address. If Akka binds to the FQDN, then other
  ActorSystem which try to connect to it using its IP address won't
 be
  successful. I assume that this might be a problem. I tried to fix
  it.
  You
  can find it here [1]. Could you please try it out by starting a
  local
  cluster with the start-local.sh script. If it fails, could you
  please
  send
  me all log files (client, jobmanager and taskmanager). Once we
  figured
  out
  why the JobCilent does not connect, we can try to tackle the
  BlobServer
  issue.
 
  Cheers,
 
  Till
 
  [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
 
  On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga 
  vidura...@icloud.com
 
  wrote:
 
  Hi,
  The error message is,
 
  21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
  - Unable to load native-hadoop library for your platform...
  using
  builtin-java classes where applicable
  org.apache.flink.client.program.ProgramInvocationException: Could
  not
  build up connection to JobManager.
   at
  org.apache.flink.client.program.Client.run(Client.java:327)
   at
  org.apache.flink.client.program.Client.run(Client.java:306)
   at
  org.apache.flink.client.program.Client.run(Client.java:300)
   at
 
 
 
 
 

Re: Could not build up connection to JobManager

2015-03-16 Thread Ufuk Celebi
There was an issue for this:
https://issues.apache.org/jira/browse/FLINK-1634

Can we close it then?

On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga vidura...@icloud.com
wrote:

 Hay Stephan,
 Great to know you could fix the issue. Thank you on the update.
 Best regards.

  On Mar 14, 2015, at 9:19 PM, Stephan Ewen se...@apache.org wrote:
 
  Hey Dulaj!
 
  Forget what I said in the previous email. The issue with the wrong
 address
  binding seems to be solved now. There is another issue that the embedded
  taskmanager does not start properly, for whatever reason. My gut feeling
 is
  that there is something wrong
 
  There is a patch pending that changes the startup behavior to debug these
  situations much easier. I'll ping you as soon as that is in...
 
 
  Stephan
 
  On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org wrote:
 
  Hey Dulaj!
 
  One thing you can try is to add to the JVM startup options (in the
 scripts
  in the bin folder) the option -Djava.net.preferIPv4Stack=true and
 see
  if that helps it?
 
  Stephan
 
 
  On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com
  wrote:
 
  Hi,
  Still this is no luck. I’ll upload the logs with configuration
  “localhost as well as “127.0.0.1” so you can take a look.
 
  127.0.0.1
  flink-Vidura-flink-client-localhost.log 
 
 https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
 
 
  localhost
  flink-Vidura-flink-client-localhost.log 
 
 https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
 
  flink-Vidura-jobmanager-localhost.log 
 
 https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
 
 
  On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org
  wrote:
 
  Hi Dulaj,
 
  sorry for my late response. It looks as if the JobClient tries to
  connect
  to the JobManager using its IPv6 instead of IPv4. Akka is really picky
  when
  it comes to remote address. If Akka binds to the FQDN, then other
  ActorSystem which try to connect to it using its IP address won't be
  successful. I assume that this might be a problem. I tried to fix it.
  You
  can find it here [1]. Could you please try it out by starting a local
  cluster with the start-local.sh script. If it fails, could you please
  send
  me all log files (client, jobmanager and taskmanager). Once we figured
  out
  why the JobCilent does not connect, we can try to tackle the
 BlobServer
  issue.
 
  Cheers,
 
  Till
 
  [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
 
  On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com
 
  wrote:
 
  Hi,
  The error message is,
 
  21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
   - Unable to load native-hadoop library for your platform...
 using
  builtin-java classes where applicable
  org.apache.flink.client.program.ProgramInvocationException: Could not
  build up connection to JobManager.
at org.apache.flink.client.program.Client.run(Client.java:327)
at org.apache.flink.client.program.Client.run(Client.java:306)
at org.apache.flink.client.program.Client.run(Client.java:300)
at
 
 
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
at
 
 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at
 
 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at
 
 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:250)
at
 
 
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
at
 org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
at
 
 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
at
  org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
  Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80
  :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
  not reachable. Please make sure that the JobManager is running and
 its
  port
  is reachable.
at
 
 
 org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
at
 
 
 org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
at
 
 
 org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
at
 
 
 

Re: Could not build up connection to JobManager

2015-03-14 Thread Stephan Ewen
Hey Dulaj!

One thing you can try is to add to the JVM startup options (in the scripts
in the bin folder) the option -Djava.net.preferIPv4Stack=true and see
if that helps it?

Stephan


On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com
wrote:

 Hi,
 Still this is no luck. I’ll upload the logs with configuration “localhost
 as well as “127.0.0.1” so you can take a look.

 127.0.0.1
 flink-Vidura-flink-client-localhost.log 
 https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
 

 localhost
 flink-Vidura-flink-client-localhost.log 
 https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
 
 flink-Vidura-jobmanager-localhost.log 
 https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
 

  On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org
 wrote:
 
  Hi Dulaj,
 
  sorry for my late response. It looks as if the JobClient tries to connect
  to the JobManager using its IPv6 instead of IPv4. Akka is really picky
 when
  it comes to remote address. If Akka binds to the FQDN, then other
  ActorSystem which try to connect to it using its IP address won't be
  successful. I assume that this might be a problem. I tried to fix it. You
  can find it here [1]. Could you please try it out by starting a local
  cluster with the start-local.sh script. If it fails, could you please
 send
  me all log files (client, jobmanager and taskmanager). Once we figured
 out
  why the JobCilent does not connect, we can try to tackle the BlobServer
  issue.
 
  Cheers,
 
  Till
 
  [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
 
  On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com
  wrote:
 
  Hi,
  The error message is,
 
  21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
  builtin-java classes where applicable
  org.apache.flink.client.program.ProgramInvocationException: Could not
  build up connection to JobManager.
 at org.apache.flink.client.program.Client.run(Client.java:327)
 at org.apache.flink.client.program.Client.run(Client.java:306)
 at org.apache.flink.client.program.Client.run(Client.java:300)
 at
 
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
 at
 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at
 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
 at
 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
 at org.apache.flink.client.program.Client.run(Client.java:250)
 at
  org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
 at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
 at
 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
 at
 org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
  Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80
 :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
  not reachable. Please make sure that the JobManager is running and its
 port
  is reachable.
 at
 
 org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
 at
 
 org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
 at
 
 org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
 at
 
 org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
 at
 
 org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
 at org.apache.flink.client.program.Client.run(Client.java:322)
 ... 15 more
  Caused by: akka.actor.ActorNotFound: Actor not found for:
  ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
 at
 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at
 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at
 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at
 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at
 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)

Re: Could not build up connection to JobManager

2015-03-14 Thread Stephan Ewen
Hey Dulaj!

Forget what I said in the previous email. The issue with the wrong address
binding seems to be solved now. There is another issue that the embedded
taskmanager does not start properly, for whatever reason. My gut feeling is
that there is something wrong

There is a patch pending that changes the startup behavior to debug these
situations much easier. I'll ping you as soon as that is in...


Stephan

On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen se...@apache.org wrote:

 Hey Dulaj!

 One thing you can try is to add to the JVM startup options (in the scripts
 in the bin folder) the option -Djava.net.preferIPv4Stack=true and see
 if that helps it?

 Stephan


 On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga vidura...@icloud.com
 wrote:

 Hi,
 Still this is no luck. I’ll upload the logs with configuration
 “localhost as well as “127.0.0.1” so you can take a look.

 127.0.0.1
 flink-Vidura-flink-client-localhost.log 
 https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
 

 localhost
 flink-Vidura-flink-client-localhost.log 
 https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
 
 flink-Vidura-jobmanager-localhost.log 
 https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
 

  On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org
 wrote:
 
  Hi Dulaj,
 
  sorry for my late response. It looks as if the JobClient tries to
 connect
  to the JobManager using its IPv6 instead of IPv4. Akka is really picky
 when
  it comes to remote address. If Akka binds to the FQDN, then other
  ActorSystem which try to connect to it using its IP address won't be
  successful. I assume that this might be a problem. I tried to fix it.
 You
  can find it here [1]. Could you please try it out by starting a local
  cluster with the start-local.sh script. If it fails, could you please
 send
  me all log files (client, jobmanager and taskmanager). Once we figured
 out
  why the JobCilent does not connect, we can try to tackle the BlobServer
  issue.
 
  Cheers,
 
  Till
 
  [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
 
  On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com
  wrote:
 
  Hi,
  The error message is,
 
  21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
  builtin-java classes where applicable
  org.apache.flink.client.program.ProgramInvocationException: Could not
  build up connection to JobManager.
 at org.apache.flink.client.program.Client.run(Client.java:327)
 at org.apache.flink.client.program.Client.run(Client.java:306)
 at org.apache.flink.client.program.Client.run(Client.java:300)
 at
 
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
 at
 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at
 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
 at
 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
 at org.apache.flink.client.program.Client.run(Client.java:250)
 at
 
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
 at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
 at
 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
 at
 org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
  Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80
 :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
  not reachable. Please make sure that the JobManager is running and its
 port
  is reachable.
 at
 
 org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
 at
 
 org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
 at
 
 org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
 at
 
 org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
 at
 
 org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
 at org.apache.flink.client.program.Client.run(Client.java:322)
 ... 15 more
  Caused by: akka.actor.ActorNotFound: Actor not found for:
  ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
 at
 
 

Re: Could not build up connection to JobManager

2015-03-13 Thread Dulaj Viduranga
Hi,
Still this is no luck. I’ll upload the logs with configuration “localhost as 
well as “127.0.0.1” so you can take a look.

127.0.0.1
flink-Vidura-flink-client-localhost.log 
https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log

localhost
flink-Vidura-flink-client-localhost.log 
https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
flink-Vidura-jobmanager-localhost.log 
https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log

 On Mar 11, 2015, at 11:32 PM, Till Rohrmann trohrm...@apache.org wrote:
 
 Hi Dulaj,
 
 sorry for my late response. It looks as if the JobClient tries to connect
 to the JobManager using its IPv6 instead of IPv4. Akka is really picky when
 it comes to remote address. If Akka binds to the FQDN, then other
 ActorSystem which try to connect to it using its IP address won't be
 successful. I assume that this might be a problem. I tried to fix it. You
 can find it here [1]. Could you please try it out by starting a local
 cluster with the start-local.sh script. If it fails, could you please send
 me all log files (client, jobmanager and taskmanager). Once we figured out
 why the JobCilent does not connect, we can try to tackle the BlobServer
 issue.
 
 Cheers,
 
 Till
 
 [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
 
 On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 Hi,
 The error message is,
 
 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
   - Unable to load native-hadoop library for your platform... using
 builtin-java classes where applicable
 org.apache.flink.client.program.ProgramInvocationException: Could not
 build up connection to JobManager.
at org.apache.flink.client.program.Client.run(Client.java:327)
at org.apache.flink.client.program.Client.run(Client.java:306)
at org.apache.flink.client.program.Client.run(Client.java:300)
at
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
at
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:250)
at
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
at
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
 Caused by: java.io.IOException: JobManager at 
 akka.tcp://flink@fe80:0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
 not reachable. Please make sure that the JobManager is running and its port
 is reachable.
at
 org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
at
 org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
at
 org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
at
 org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
at
 org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
at org.apache.flink.client.program.Client.run(Client.java:322)
... 15 more
 Caused by: akka.actor.ActorNotFound: Actor not found for:
 ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
at
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at
 akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at
 

Re: Could not build up connection to JobManager

2015-03-11 Thread Till Rohrmann
Hi Dulaj,

sorry for my late response. It looks as if the JobClient tries to connect
to the JobManager using its IPv6 instead of IPv4. Akka is really picky when
it comes to remote address. If Akka binds to the FQDN, then other
ActorSystem which try to connect to it using its IP address won't be
successful. I assume that this might be a problem. I tried to fix it. You
can find it here [1]. Could you please try it out by starting a local
cluster with the start-local.sh script. If it fails, could you please send
me all log files (client, jobmanager and taskmanager). Once we figured out
why the JobCilent does not connect, we can try to tackle the BlobServer
issue.

Cheers,

Till

[1] https://github.com/tillrohrmann/flink/tree/fixJobClient

On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga vidura...@icloud.com
wrote:

 Hi,
 The error message is,

 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
 builtin-java classes where applicable
 org.apache.flink.client.program.ProgramInvocationException: Could not
 build up connection to JobManager.
 at org.apache.flink.client.program.Client.run(Client.java:327)
 at org.apache.flink.client.program.Client.run(Client.java:306)
 at org.apache.flink.client.program.Client.run(Client.java:300)
 at
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
 at
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
 at
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
 at org.apache.flink.client.program.Client.run(Client.java:250)
 at
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
 at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
 at
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
 at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
 Caused by: java.io.IOException: JobManager at 
 akka.tcp://flink@fe80:0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
 not reachable. Please make sure that the JobManager is running and its port
 is reachable.
 at
 org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
 at
 org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
 at
 org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
 at
 org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
 at
 org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
 at org.apache.flink.client.program.Client.run(Client.java:322)
 ... 15 more
 Caused by: akka.actor.ActorNotFound: Actor not found for:
 ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
 at
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at
 akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at
 scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
 at
 scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
 at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
 at
 

Re: Could not build up connection to JobManager

2015-03-05 Thread Dulaj Viduranga
Hi Till,
I’m sorry. It doesn’t seem to solve the problem. The taskmanager still tries a 
10.0.0.0/8 IP.

Best regards.

 On Mar 5, 2015, at 1:00 PM, Till Rohrmann till.rohrm...@gmail.com wrote:
 
 Hi Dulaj,
 
 I looked through your commit and noticed that the JobClient might not be
 listening on the right network interface. Your commit seems to fix it. I
 just want to understand the problem properly and therefore I opened a
 branch with a small change. Could you try out whether this change would
 also fix your problem? You can find the code here [1]. Would be awesome if
 you checked it out and let it run on your cluster setting. Thanks a lot
 Dulaj!
 
 [1]
 https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
 
 On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 The every change in the commit b7da22a is not required but I thought they
 are appropriate.
 
 On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 Hi,
 I found many other places “localhost” is hard coded. I changed them in a
 better way I think. I made a pull request. Please review. b7da22a 
 https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
 
 
 On Mar 4, 2015, at 8:17 PM, Stephan Ewen se...@apache.org wrote:
 
 If I recall correctly, we only hardcode localhost in the local mini
 cluster - do you think it is problematic there as well?
 
 Have you found any other places?
 
 On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 In some places of the code, localhost is hard coded. When it is
 resolved
 by the DNS, it is posible to be directed  to a different IP other than
 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to
 127.0.0.1 and it works like a charm.
 But hard coding 127.0.0.1 is not a good option because when the
 jobmanager
 ip is changed, this becomes an issue again. I'm thinking of setting
 jobmanager ip from the config.yaml to these places.
 If you have a better idea on doing this with your experience, please
 let
 me know.
 
 Best.
 
 
 
 



Re: Could not build up connection to JobManager

2015-03-05 Thread Dulaj Viduranga
Hi,
The error message is,

21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader  
 - Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable
org.apache.flink.client.program.ProgramInvocationException: Could not build up 
connection to JobManager.
at org.apache.flink.client.program.Client.run(Client.java:327)
at org.apache.flink.client.program.Client.run(Client.java:306)
at org.apache.flink.client.program.Client.run(Client.java:300)
at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
at 
org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:250)
at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
Caused by: java.io.IOException: JobManager at 
akka.tcp://flink@fe80:0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager not 
reachable. Please make sure that the JobManager is running and its port is 
reachable.
at 
org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
at 
org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
at 
org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
at 
org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
at 
org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
at org.apache.flink.client.program.Client.run(Client.java:322)
... 15 more
Caused by: akka.actor.ActorNotFound: Actor not found for: 
ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at 
akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at 
scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
at 
scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
at 
org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
at 
org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
... 20 more

The exception above occurred while trying to run your command.





Client log doesn’t seem to show any info,


21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader  
 - Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable
21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment   
 - The job has 0 registered types and 0 default Kryo serializers
21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger 
 - Slf4jLogger started
21:06:02,909 INFO  Remoting 
 - Starting remoting
21:06:03,158 INFO  Remoting 

Re: Could not build up connection to JobManager

2015-03-05 Thread Dulaj Viduranga
Glad I could help any way. :)
When the address is set to “localhost” I cannot submit a job. It immediately 
fails. But the address is “127.0.0.1”, it is stuck a little whyle on DEPLOYING 
and the fails.
Correct me if I’m wrong but I think since using the address, hardcoded in 
config file, won’t harm anything, it will be safer to use it rather than 
defining it in the code. 

 On Mar 5, 2015, at 6:57 PM, Till Rohrmann trohrm...@apache.org wrote:
 
 Could you submit a job when you set the job manager address to localhost?
 I did not see any logging statements of received jobs. If you did, could
 you also send the logs of the client?
 
 The 0.0.0.0 to which the BlobServer binds works for me on my machine. I
 cannot remember that we had problems with that before. But I agree, we
 should set it to the network interface which the JobManager uses.
 
 I cannot explain why your fix solves the problem. It does not touch any of
 the JobClient/JobManager logic.
 
 I updated my local branch [1] with a fix for the BlobServer. Could you try
 it out again and send us the logs? Thanks a lot for your help Dulaj.
 
 On Thu, Mar 5, 2015 at 1:24 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 But can you explain why did my fix solved it?
 
 On Mar 5, 2015, at 5:50 PM, Stephan Ewen se...@apache.org wrote:
 
 Hi Dulaj!
 
 Okay, the logs give us some insight. Both setups seem to look good in
 terms
 of TaskManager and JobManager startup.
 
 In one of the logs (127.0.0.1) you submit a job. The job fails because
 the
 TaskManager cannot grab the JAR file from the JobManager.
 I think the problem is that the BLOB server binds to 0.0.0.0 - it should
 bind to the same address as the JobManager actor system.
 
 That should definitely be changed...
 
 On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 Hi,
 This is the log with setting “localhost”
 flink-Vidura-jobmanager-localhost.log 
 
 https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log
 
 
 And this is the log with setting “127.0.0.1”
 flink-Vidura-jobmanager-localhost.log 
 
 https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log
 
 
 On Mar 5, 2015, at 2:23 PM, Till Rohrmann trohrm...@apache.org
 wrote:
 
 What does the jobmanager log says? I think Stephan added some more
 logging
 output which helps us to debug this problem.
 
 On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 Using start-locat.sh.
 I’m using the original config yaml. I also tried changing jobmanager
 address in config to “127.0.0.1 but no luck. With my changes it works
 ok.
 The conf file follows.
 
 
 
 
 
 #  Licensed to the Apache Software Foundation (ASF) under one
 #  or more contributor license agreements.  See the NOTICE file
 #  distributed with this work for additional information
 #  regarding copyright ownership.  The ASF licenses this file
 #  to you under the Apache License, Version 2.0 (the
 #  License); you may not use this file except in compliance
 #  with the License.  You may obtain a copy of the License at
 #
 #  http://www.apache.org/licenses/LICENSE-2.0
 #
 #  Unless required by applicable law or agreed to in writing, software
 #  distributed under the License is distributed on an AS IS BASIS,
 #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
 implied.
 #  See the License for the specific language governing permissions and
 # limitations under the License.
 
 
 
 
 
 
 
 
 
 #==
 # Common
 
 
 
 #==
 
 jobmanager.rpc.address: 127.0.0.1
 
 jobmanager.rpc.port: 6123
 
 jobmanager.heap.mb: 256
 
 taskmanager.heap.mb: 512
 
 taskmanager.numberOfTaskSlots: 1
 
 parallelization.degree.default: 1
 
 
 
 
 #==
 # Web Frontend
 
 
 
 #==
 
 # The port under which the web-based runtime monitor listens.
 # A value of -1 deactivates the web server.
 
 jobmanager.web.port: 8081
 
 # The port uder which the standalone web client
 # (for job upload and submit) listens.
 
 webclient.port: 8080
 
 
 
 
 #==
 # Advanced
 
 
 
 #==
 
 # The number of buffers for the network stack.
 #
 # taskmanager.network.numberOfBuffers: 2048
 
 # Directories for temporary files.
 #
 # Add a delimited list for multiple directories, using the system
 directory
 # delimiter (colon ':' on unix) or a comma, e.g.:
 # 

Re: Could not build up connection to JobManager

2015-03-05 Thread Stephan Ewen
Hi Dulaj!

Okay, the logs give us some insight. Both setups seem to look good in terms
of TaskManager and JobManager startup.

In one of the logs (127.0.0.1) you submit a job. The job fails because the
TaskManager cannot grab the JAR file from the JobManager.
I think the problem is that the BLOB server binds to 0.0.0.0 - it should
bind to the same address as the JobManager actor system.

That should definitely be changed...

On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga vidura...@icloud.com
wrote:

 Hi,
 This is the log with setting “localhost”
 flink-Vidura-jobmanager-localhost.log 
 https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log
 

 And this is the log with setting “127.0.0.1”
 flink-Vidura-jobmanager-localhost.log 
 https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log
 

  On Mar 5, 2015, at 2:23 PM, Till Rohrmann trohrm...@apache.org wrote:
 
  What does the jobmanager log says? I think Stephan added some more
 logging
  output which helps us to debug this problem.
 
  On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga vidura...@icloud.com
  wrote:
 
  Using start-locat.sh.
  I’m using the original config yaml. I also tried changing jobmanager
  address in config to “127.0.0.1 but no luck. With my changes it works
 ok.
  The conf file follows.
 
 
 
 
  #  Licensed to the Apache Software Foundation (ASF) under one
  #  or more contributor license agreements.  See the NOTICE file
  #  distributed with this work for additional information
  #  regarding copyright ownership.  The ASF licenses this file
  #  to you under the Apache License, Version 2.0 (the
  #  License); you may not use this file except in compliance
  #  with the License.  You may obtain a copy of the License at
  #
  #  http://www.apache.org/licenses/LICENSE-2.0
  #
  #  Unless required by applicable law or agreed to in writing, software
  #  distributed under the License is distributed on an AS IS BASIS,
  #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
 implied.
  #  See the License for the specific language governing permissions and
  # limitations under the License.
 
 
 
 
 
 
 
 #==
  # Common
 
 
 #==
 
  jobmanager.rpc.address: 127.0.0.1
 
  jobmanager.rpc.port: 6123
 
  jobmanager.heap.mb: 256
 
  taskmanager.heap.mb: 512
 
  taskmanager.numberOfTaskSlots: 1
 
  parallelization.degree.default: 1
 
 
 
 #==
  # Web Frontend
 
 
 #==
 
  # The port under which the web-based runtime monitor listens.
  # A value of -1 deactivates the web server.
 
  jobmanager.web.port: 8081
 
  # The port uder which the standalone web client
  # (for job upload and submit) listens.
 
  webclient.port: 8080
 
 
 
 #==
  # Advanced
 
 
 #==
 
  # The number of buffers for the network stack.
  #
  # taskmanager.network.numberOfBuffers: 2048
 
  # Directories for temporary files.
  #
  # Add a delimited list for multiple directories, using the system
 directory
  # delimiter (colon ':' on unix) or a comma, e.g.:
  # /data1/tmp:/data2/tmp:/data3/tmp
  #
  # Note: Each directory entry is read from and written to by a different
 I/O
  # thread. You can include the same directory multiple times in order to
  create
  # multiple I/O threads against that directory. This is for example
  relevant for
  # high-throughput RAIDs.
  #
  # If not specified, the system-specific Java temporary directory
  (java.io.tmpdir
  # property) is taken.
  #
  # taskmanager.tmp.dirs: /tmp
 
  # Path to the Hadoop configuration directory.
  #
  # This configuration is used when writing into HDFS. Unless specified
  otherwise,
  # HDFS file creation will use HDFS default settings with respect to
  block-size,
  # replication factor, etc.
  #
  # You can also directly specify the paths to hdfs-default.xml and
  hdfs-site.xml
  # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
  #
  # fs.hdfs.hadoopconf: /path/to/hadoop/conf/
 
 
  On Mar 5, 2015, at 2:03 PM, Till Rohrmann trohrm...@apache.org
 wrote:
 
  How did you start the flink cluster? Using the start-local.sh, the
  start-cluster.sh or starting the job manager and task managers
  individually
  using taskmanager.sh/jobmanager.sh. Could you maybe post the
  flink-conf.yaml file, you're using?
 
  With your changes, everything works, right?
 
  On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga 

Re: Could not build up connection to JobManager

2015-03-05 Thread Till Rohrmann
Could you submit a job when you set the job manager address to localhost?
I did not see any logging statements of received jobs. If you did, could
you also send the logs of the client?

The 0.0.0.0 to which the BlobServer binds works for me on my machine. I
cannot remember that we had problems with that before. But I agree, we
should set it to the network interface which the JobManager uses.

I cannot explain why your fix solves the problem. It does not touch any of
the JobClient/JobManager logic.

I updated my local branch [1] with a fix for the BlobServer. Could you try
it out again and send us the logs? Thanks a lot for your help Dulaj.

On Thu, Mar 5, 2015 at 1:24 PM, Dulaj Viduranga vidura...@icloud.com
wrote:

 But can you explain why did my fix solved it?

  On Mar 5, 2015, at 5:50 PM, Stephan Ewen se...@apache.org wrote:
 
  Hi Dulaj!
 
  Okay, the logs give us some insight. Both setups seem to look good in
 terms
  of TaskManager and JobManager startup.
 
  In one of the logs (127.0.0.1) you submit a job. The job fails because
 the
  TaskManager cannot grab the JAR file from the JobManager.
  I think the problem is that the BLOB server binds to 0.0.0.0 - it should
  bind to the same address as the JobManager actor system.
 
  That should definitely be changed...
 
  On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga vidura...@icloud.com
  wrote:
 
  Hi,
  This is the log with setting “localhost”
  flink-Vidura-jobmanager-localhost.log 
 
 https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log
 
 
  And this is the log with setting “127.0.0.1”
  flink-Vidura-jobmanager-localhost.log 
 
 https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log
 
 
  On Mar 5, 2015, at 2:23 PM, Till Rohrmann trohrm...@apache.org
 wrote:
 
  What does the jobmanager log says? I think Stephan added some more
  logging
  output which helps us to debug this problem.
 
  On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga vidura...@icloud.com
  wrote:
 
  Using start-locat.sh.
  I’m using the original config yaml. I also tried changing jobmanager
  address in config to “127.0.0.1 but no luck. With my changes it works
  ok.
  The conf file follows.
 
 
 
 
 
  #  Licensed to the Apache Software Foundation (ASF) under one
  #  or more contributor license agreements.  See the NOTICE file
  #  distributed with this work for additional information
  #  regarding copyright ownership.  The ASF licenses this file
  #  to you under the Apache License, Version 2.0 (the
  #  License); you may not use this file except in compliance
  #  with the License.  You may obtain a copy of the License at
  #
  #  http://www.apache.org/licenses/LICENSE-2.0
  #
  #  Unless required by applicable law or agreed to in writing, software
  #  distributed under the License is distributed on an AS IS BASIS,
  #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
  implied.
  #  See the License for the specific language governing permissions and
  # limitations under the License.
 
 
 
 
 
 
 
 
 
 #==
  # Common
 
 
 
 #==
 
  jobmanager.rpc.address: 127.0.0.1
 
  jobmanager.rpc.port: 6123
 
  jobmanager.heap.mb: 256
 
  taskmanager.heap.mb: 512
 
  taskmanager.numberOfTaskSlots: 1
 
  parallelization.degree.default: 1
 
 
 
 
 #==
  # Web Frontend
 
 
 
 #==
 
  # The port under which the web-based runtime monitor listens.
  # A value of -1 deactivates the web server.
 
  jobmanager.web.port: 8081
 
  # The port uder which the standalone web client
  # (for job upload and submit) listens.
 
  webclient.port: 8080
 
 
 
 
 #==
  # Advanced
 
 
 
 #==
 
  # The number of buffers for the network stack.
  #
  # taskmanager.network.numberOfBuffers: 2048
 
  # Directories for temporary files.
  #
  # Add a delimited list for multiple directories, using the system
  directory
  # delimiter (colon ':' on unix) or a comma, e.g.:
  # /data1/tmp:/data2/tmp:/data3/tmp
  #
  # Note: Each directory entry is read from and written to by a
 different
  I/O
  # thread. You can include the same directory multiple times in order
 to
  create
  # multiple I/O threads against that directory. This is for example
  relevant for
  # high-throughput RAIDs.
  #
  # If not specified, the system-specific Java temporary directory
  (java.io.tmpdir
  # property) is taken.
  #
  # 

Re: Could not build up connection to JobManager

2015-03-04 Thread Dulaj Viduranga
Hi,
I found many other places “localhost” is hard coded. I changed them in a better 
way I think. I made a pull request. Please review. b7da22a 
https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd

 On Mar 4, 2015, at 8:17 PM, Stephan Ewen se...@apache.org wrote:
 
 If I recall correctly, we only hardcode localhost in the local mini
 cluster - do you think it is problematic there as well?
 
 Have you found any other places?
 
 On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 In some places of the code, localhost is hard coded. When it is resolved
 by the DNS, it is posible to be directed  to a different IP other than
 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to
 127.0.0.1 and it works like a charm.
 But hard coding 127.0.0.1 is not a good option because when the jobmanager
 ip is changed, this becomes an issue again. I'm thinking of setting
 jobmanager ip from the config.yaml to these places.
 If you have a better idea on doing this with your experience, please let
 me know.
 
 Best.
 



Re: Could not build up connection to JobManager

2015-03-04 Thread Dulaj Viduranga
The every change in the commit b7da22a is not required but I thought they are 
appropriate.

 On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga vidura...@icloud.com wrote:
 
 Hi,
 I found many other places “localhost” is hard coded. I changed them in a 
 better way I think. I made a pull request. Please review. b7da22a 
 https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
 
 On Mar 4, 2015, at 8:17 PM, Stephan Ewen se...@apache.org wrote:
 
 If I recall correctly, we only hardcode localhost in the local mini
 cluster - do you think it is problematic there as well?
 
 Have you found any other places?
 
 On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 In some places of the code, localhost is hard coded. When it is resolved
 by the DNS, it is posible to be directed  to a different IP other than
 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to
 127.0.0.1 and it works like a charm.
 But hard coding 127.0.0.1 is not a good option because when the jobmanager
 ip is changed, this becomes an issue again. I'm thinking of setting
 jobmanager ip from the config.yaml to these places.
 If you have a better idea on doing this with your experience, please let
 me know.
 
 Best.
 
 



Re: Could not build up connection to JobManager

2015-03-02 Thread Dulaj Viduranga

Hi,
I found the fix for this issue and I'll create a pull request in the following 
day.

Re: Could not build up connection to JobManager

2015-03-02 Thread Robert Metzger
Calling:
java -cp ../examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar
org.apache.flink.examples.java.clustering.util.KMeansDataGenerator 500 10
0.08

Will not connect to Flink. Its just running a standalone KMeans data
generator, not KMeans.
I would suspect that the KMeans example is not running as well.

You can run the KMeans example like this:
bin/flink run ./examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar.



On Sat, Feb 28, 2015 at 5:47 AM, Dulaj Viduranga vidura...@icloud.com
wrote:

 Hi,
 I’m thinking I’m doing something wrong. After setting jobManager address
 to 127.0.0.1, I can run kmeans example (java -cp
 ../examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar
 org.apache.flink.examples.java.clustering.util.KMeansDataGenerator 500 10
 0.08)
 But I can’t run word count example (bin/flink run
 ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
 file:'///Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt'
 file:'///Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count.txt’)

 I’m not sure whether I’m running it wrong

  On Feb 26, 2015, at 9:03 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
  Hi,
It’s great to help out. :)
 
Setting 127.0.0.1 instead of “localhost” in
 jobmanager.rpc.address, helped to build the connection to the jobmanager.
 Apparently localhost resolving is different in webclient and the
 jobmanager. I think it’s good to set jobmanager.rpc.address: 127.0.0.1 in
 future builds.
But then I get this error when I tried to run examples. I don’t
 know if I should move this issue to another thread. If so please tell me.
 
  bin/flink run
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
 $FLINK_DIRECTORY/count
 
 
  20:46:21,998 WARN  org.apache.hadoop.util.NativeCodeLoader
  - Unable to load native-hadoop library for your platform... using
 builtin-java classes where applicable
  02/26/2015 20:46:23   Job execution switched to status RUNNING.
  02/26/2015 20:46:23   CHAIN DataSource (at
 getTextDataSet(WordCount.java:141)
 (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at
 main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1)
 switched to SCHEDULED
  02/26/2015 20:46:23   CHAIN DataSource (at
 getTextDataSet(WordCount.java:141)
 (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at
 main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1)
 switched to DEPLOYING
  02/26/2015 20:48:03   CHAIN DataSource (at
 getTextDataSet(WordCount.java:141)
 (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at
 main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1)
 switched to FAILED
  akka.pattern.AskTimeoutException: Ask timed out on
 [Actor[akka://flink/user/taskmanager#-1628133761]] after [10 ms]
at
 akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at
 scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
at
 scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
at
 akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
at
 akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
at
 akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
at
 akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)
 
  02/26/2015 20:48:03   Job execution switched to status FAILING.
  02/26/2015 20:48:03   Reduce (SUM(1), at main(WordCount.java:72)(1/1)
 switched to CANCELED
  02/26/2015 20:48:03   DataSink(CsvOutputFormat (path:
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count,
 delimiter:  ))(1/1) switched to CANCELED
  02/26/2015 20:48:03   Job execution switched to status FAILED.
  org.apache.flink.client.program.ProgramInvocationException: The program
 execution failed.
at org.apache.flink.client.program.Client.run(Client.java:344)
at org.apache.flink.client.program.Client.run(Client.java:306)
at org.apache.flink.client.program.Client.run(Client.java:300)
at
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
at
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 

Re: Could not build up connection to JobManager

2015-03-02 Thread Dulaj Viduranga

In some places of the code, localhost is hard coded. When it is resolved by 
the DNS, it is posible to be directed  to a different IP other than 127.0.0.1 (like 
private range 10.0.0.0/8). I changed those places to 127.0.0.1 and it works like a charm.
But hard coding 127.0.0.1 is not a good option because when the jobmanager ip 
is changed, this becomes an issue again. I'm thinking of setting jobmanager ip 
from the config.yaml to these places.
If you have a better idea on doing this with your experience, please let me 
know.

Best.


Re: Could not build up connection to JobManager

2015-03-02 Thread Stephan Ewen
Wow, great. Can you tell us what the issue was?
Am 02.03.2015 09:31 schrieb Dulaj Viduranga vidura...@icloud.com:

 Hi,
 I found the fix for this issue and I'll create a pull request in the
 following day.



Re: Could not build up connection to JobManager

2015-02-27 Thread Dulaj Viduranga
Here is the taskmanager log when I tried taskmanager.sh start

flink-Vidura-taskmanager-localhost.log 
https://gist.github.com/anonymous/aef5a0bf8722feee9b97#file-flink-vidura-taskmanager-localhost-log

 On Feb 27, 2015, at 4:12 PM, Till Rohrmann trohrm...@apache.org wrote:
 
 It depends on how you started Flink. If you started a local cluster, then
 the TaskManager log is contained in the JobManager log we just don't see
 the respective log output in the snippet you posted. If you started a
 TaskManager independently, either by taskmanager.sh or by start-cluster.sh,
 then a file with the name format flink-user-taskmanager-hostname.log
 should be created in flink/log/. If the Flink directory is not shared  by
 your cluster nodes, then you have to look on the machine on which you
 started the TaskManager.
 
 But since the JobManager binds to 127.0.0.1 I guess that you started a
 local cluster. Try whether you find some logging statements from the
 logger org.apache.flink.runtime.taskmanager.TaskManager in your log. Maybe
 you can upload the corresponding log file to [1] and post a link here.
 
 Greets,
 
 Till
 
 [1] https://gist.github.com/
 
 On Thu, Feb 26, 2015 at 6:45 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 Hi,
Can you tell me where I can find TaskManager logs. I can’t find
 them in logs folder? I don’t suppose I should run taskmanager.sh as well.
 Right?
I’m using a OS X Yosemite. I’ll send you my ifconfig.
 
 lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 16384
options=3RXCSUM,TXCSUM
inet6 ::1 prefixlen 128
inet 127.0.0.1 netmask 0xff00
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
nd6 options=1PERFORMNUD
 gif0: flags=8010POINTOPOINT,MULTICAST mtu 1280
 stf0: flags=0 mtu 1280
 en0: flags=8823UP,BROADCAST,SMART,SIMPLEX,MULTICAST mtu 1500
ether 60:03:08:a1:e0:f4
nd6 options=1PERFORMNUD
media: autoselect (unknown type)
status: inactive
 en1: flags=8963UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu
 1500
options=60TSO4,TSO6
ether 72:00:02:32:14:d0
media: autoselect full-duplex
status: inactive
 en2: flags=8963UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu
 1500
options=60TSO4,TSO6
ether 72:00:02:32:14:d1
media: autoselect full-duplex
status: inactive
 bridge0: flags=8822BROADCAST,SMART,SIMPLEX,MULTICAST mtu 1500
options=63RXCSUM,TXCSUM,TSO4,TSO6
ether 62:03:08:1a:fa:00
Configuration:
id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
ipfilter disabled flags 0x2
member: en1 flags=3LEARNING,DISCOVER
ifmaxaddr 0 port 5 priority 0 path cost 0
member: en2 flags=3LEARNING,DISCOVER
ifmaxaddr 0 port 6 priority 0 path cost 0
media: unknown type
status: inactive
 p2p0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 2304
ether 02:03:08:a1:e0:f4
media: autoselect
status: inactive
 awdl0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 1452
ether 06:56:3d:f6:60:08
nd6 options=1PERFORMNUD
media: autoselect
status: inactive
 ppp0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1500
inet 10.218.98.228 -- 10.64.64.64 netmask 0xff00
 utun0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1380
inet6 fe80::b0d4:d4be:7e62:e730%utun0 prefixlen 64 scopeid 0xb
inet6 fdd0:b291:7da7:9153:b0d4:d4be:7e62:e730 prefixlen 64
nd6 options=1PERFORMNUD
 
 
 On Feb 26, 2015, at 10:48 PM, Stephan Ewen se...@apache.org wrote:
 
 Hi Dulaj!
 
 Thanks for helping to debug.
 
 My guess is that you are seeing now the same thing between JobManager and
 TaskManager as you saw before between JobManager and JobClient. I have a
 patch pending that should help the issue (see
 https://issues.apache.org/jira/browse/FLINK-1608), let's see if that
 solves
 it.
 
 What seems not right is that the JobManager initially accepted the
 TaskManager and later the communication. Can you paste the TaskManager
 log
 as well?
 
 Also: There must be something fairly unique about your network
 configuration, as it works on all other setups that we use (locally,
 cloud,
 test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any
 chance?
 
 Greetings,
 Stephan
 
 
 
 On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 Hi,
   It’s great to help out. :)
 
   Setting 127.0.0.1 instead of “localhost” in
 jobmanager.rpc.address, helped to build the connection to the
 jobmanager.
 Apparently localhost resolving is different in webclient and the
 jobmanager. I think it’s good to set jobmanager.rpc.address:
 127.0.0.1 in
 future builds.
   But then I get this error when I tried to run examples. I don’t
 know if I 

Re: Could not build up connection to JobManager

2015-02-27 Thread Till Rohrmann
It depends on how you started Flink. If you started a local cluster, then
the TaskManager log is contained in the JobManager log we just don't see
the respective log output in the snippet you posted. If you started a
TaskManager independently, either by taskmanager.sh or by start-cluster.sh,
then a file with the name format flink-user-taskmanager-hostname.log
should be created in flink/log/. If the Flink directory is not shared  by
your cluster nodes, then you have to look on the machine on which you
started the TaskManager.

But since the JobManager binds to 127.0.0.1 I guess that you started a
local cluster. Try whether you find some logging statements from the
logger org.apache.flink.runtime.taskmanager.TaskManager in your log. Maybe
you can upload the corresponding log file to [1] and post a link here.

Greets,

Till

[1] https://gist.github.com/

On Thu, Feb 26, 2015 at 6:45 PM, Dulaj Viduranga vidura...@icloud.com
wrote:

 Hi,
 Can you tell me where I can find TaskManager logs. I can’t find
 them in logs folder? I don’t suppose I should run taskmanager.sh as well.
 Right?
 I’m using a OS X Yosemite. I’ll send you my ifconfig.

 lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 16384
 options=3RXCSUM,TXCSUM
 inet6 ::1 prefixlen 128
 inet 127.0.0.1 netmask 0xff00
 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
 nd6 options=1PERFORMNUD
 gif0: flags=8010POINTOPOINT,MULTICAST mtu 1280
 stf0: flags=0 mtu 1280
 en0: flags=8823UP,BROADCAST,SMART,SIMPLEX,MULTICAST mtu 1500
 ether 60:03:08:a1:e0:f4
 nd6 options=1PERFORMNUD
 media: autoselect (unknown type)
 status: inactive
 en1: flags=8963UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu
 1500
 options=60TSO4,TSO6
 ether 72:00:02:32:14:d0
 media: autoselect full-duplex
 status: inactive
 en2: flags=8963UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu
 1500
 options=60TSO4,TSO6
 ether 72:00:02:32:14:d1
 media: autoselect full-duplex
 status: inactive
 bridge0: flags=8822BROADCAST,SMART,SIMPLEX,MULTICAST mtu 1500
 options=63RXCSUM,TXCSUM,TSO4,TSO6
 ether 62:03:08:1a:fa:00
 Configuration:
 id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
 maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
 root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
 ipfilter disabled flags 0x2
 member: en1 flags=3LEARNING,DISCOVER
 ifmaxaddr 0 port 5 priority 0 path cost 0
 member: en2 flags=3LEARNING,DISCOVER
 ifmaxaddr 0 port 6 priority 0 path cost 0
 media: unknown type
 status: inactive
 p2p0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 2304
 ether 02:03:08:a1:e0:f4
 media: autoselect
 status: inactive
 awdl0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 1452
 ether 06:56:3d:f6:60:08
 nd6 options=1PERFORMNUD
 media: autoselect
 status: inactive
 ppp0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1500
 inet 10.218.98.228 -- 10.64.64.64 netmask 0xff00
 utun0: flags=8051UP,POINTOPOINT,RUNNING,MULTICAST mtu 1380
 inet6 fe80::b0d4:d4be:7e62:e730%utun0 prefixlen 64 scopeid 0xb
 inet6 fdd0:b291:7da7:9153:b0d4:d4be:7e62:e730 prefixlen 64
 nd6 options=1PERFORMNUD


  On Feb 26, 2015, at 10:48 PM, Stephan Ewen se...@apache.org wrote:
 
  Hi Dulaj!
 
  Thanks for helping to debug.
 
  My guess is that you are seeing now the same thing between JobManager and
  TaskManager as you saw before between JobManager and JobClient. I have a
  patch pending that should help the issue (see
  https://issues.apache.org/jira/browse/FLINK-1608), let's see if that
 solves
  it.
 
  What seems not right is that the JobManager initially accepted the
  TaskManager and later the communication. Can you paste the TaskManager
 log
  as well?
 
  Also: There must be something fairly unique about your network
  configuration, as it works on all other setups that we use (locally,
 cloud,
  test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any
  chance?
 
  Greetings,
  Stephan
 
 
 
  On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga vidura...@icloud.com
  wrote:
 
  Hi,
 It’s great to help out. :)
 
 Setting 127.0.0.1 instead of “localhost” in
  jobmanager.rpc.address, helped to build the connection to the
 jobmanager.
  Apparently localhost resolving is different in webclient and the
  jobmanager. I think it’s good to set jobmanager.rpc.address:
 127.0.0.1 in
  future builds.
 But then I get this error when I tried to run examples. I don’t
  know if I should move this issue to another thread. If so please tell
 me.
 
  bin/flink run
 
 

Re: Could not build up connection to JobManager

2015-02-26 Thread Dulaj Viduranga
Hi,
It’s great to help out. :)

Setting 127.0.0.1 instead of “localhost” in jobmanager.rpc.address, 
helped to build the connection to the jobmanager. Apparently localhost 
resolving is different in webclient and the jobmanager. I think it’s good to 
set jobmanager.rpc.address: 127.0.0.1 in future builds.
But then I get this error when I tried to run examples. I don’t know if 
I should move this issue to another thread. If so please tell me.

bin/flink run 
/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
 
/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
 $FLINK_DIRECTORY/count


20:46:21,998 WARN  org.apache.hadoop.util.NativeCodeLoader  
 - Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable
02/26/2015 20:46:23 Job execution switched to status RUNNING.
02/26/2015 20:46:23 CHAIN DataSource (at getTextDataSet(WordCount.java:141) 
(org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at 
main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) 
switched to SCHEDULED 
02/26/2015 20:46:23 CHAIN DataSource (at getTextDataSet(WordCount.java:141) 
(org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at 
main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) 
switched to DEPLOYING 
02/26/2015 20:48:03 CHAIN DataSource (at getTextDataSet(WordCount.java:141) 
(org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at 
main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1) 
switched to FAILED 
akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/taskmanager#-1628133761]] after [10 ms]
at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at 
scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
at 
akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
at 
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
at 
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)

02/26/2015 20:48:03 Job execution switched to status FAILING.
02/26/2015 20:48:03 Reduce (SUM(1), at main(WordCount.java:72)(1/1) 
switched to CANCELED 
02/26/2015 20:48:03 DataSink(CsvOutputFormat (path: 
/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count,
 delimiter:  ))(1/1) switched to CANCELED 
02/26/2015 20:48:03 Job execution switched to status FAILED.
org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed.
at org.apache.flink.client.program.Client.run(Client.java:344)
at org.apache.flink.client.program.Client.run(Client.java:306)
at org.apache.flink.client.program.Client.run(Client.java:300)
at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
at 
org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:250)
at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution 
failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:284)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 

Re: Could not build up connection to JobManager

2015-02-26 Thread Stephan Ewen
Hi Dulaj!

Thanks for helping to debug.

My guess is that you are seeing now the same thing between JobManager and
TaskManager as you saw before between JobManager and JobClient. I have a
patch pending that should help the issue (see
https://issues.apache.org/jira/browse/FLINK-1608), let's see if that solves
it.

What seems not right is that the JobManager initially accepted the
TaskManager and later the communication. Can you paste the TaskManager log
as well?

Also: There must be something fairly unique about your network
configuration, as it works on all other setups that we use (locally, cloud,
test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any
chance?

Greetings,
Stephan



On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga vidura...@icloud.com
wrote:

 Hi,
 It’s great to help out. :)

 Setting 127.0.0.1 instead of “localhost” in
 jobmanager.rpc.address, helped to build the connection to the jobmanager.
 Apparently localhost resolving is different in webclient and the
 jobmanager. I think it’s good to set jobmanager.rpc.address: 127.0.0.1 in
 future builds.
 But then I get this error when I tried to run examples. I don’t
 know if I should move this issue to another thread. If so please tell me.

 bin/flink run
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
 $FLINK_DIRECTORY/count


 20:46:21,998 WARN  org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
 builtin-java classes where applicable
 02/26/2015 20:46:23 Job execution switched to status RUNNING.
 02/26/2015 20:46:23 CHAIN DataSource (at
 getTextDataSet(WordCount.java:141)
 (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at
 main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1)
 switched to SCHEDULED
 02/26/2015 20:46:23 CHAIN DataSource (at
 getTextDataSet(WordCount.java:141)
 (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at
 main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1)
 switched to DEPLOYING
 02/26/2015 20:48:03 CHAIN DataSource (at
 getTextDataSet(WordCount.java:141)
 (org.apache.flink.api.java.io.TextInputFormat)) - FlatMap (FlatMap at
 main(WordCount.java:69)) - Combine(SUM(1), at main(WordCount.java:72)(1/1)
 switched to FAILED
 akka.pattern.AskTimeoutException: Ask timed out on
 [Actor[akka://flink/user/taskmanager#-1628133761]] after [10 ms]
 at
 akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
 at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
 at
 scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
 at
 scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
 at
 akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
 at
 akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
 at
 akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
 at
 akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
 at java.lang.Thread.run(Thread.java:745)

 02/26/2015 20:48:03 Job execution switched to status FAILING.
 02/26/2015 20:48:03 Reduce (SUM(1), at main(WordCount.java:72)(1/1)
 switched to CANCELED
 02/26/2015 20:48:03 DataSink(CsvOutputFormat (path:
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count,
 delimiter:  ))(1/1) switched to CANCELED
 02/26/2015 20:48:03 Job execution switched to status FAILED.
 org.apache.flink.client.program.ProgramInvocationException: The program
 execution failed.
 at org.apache.flink.client.program.Client.run(Client.java:344)
 at org.apache.flink.client.program.Client.run(Client.java:306)
 at org.apache.flink.client.program.Client.run(Client.java:300)
 at
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
 at
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
 at
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
 at 

Re: Could not build up connection to JobManager

2015-02-25 Thread Dulaj Viduranga
Hi,
Sorry for the delay to reply on this issue.
the jobmanager.rpc.address is set to “localhost” already in conf.yaml.
This can’t be an issue because the job manager web interface works fine which 
also runs on localhost

 bin/flink run jar doesn’t seem to work either. Let me send you my command 
and the result in terminal.

bin/flink run 
/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
 
/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
 $FLINK_DIRECTORY/count

20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader  
 - Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable
org.apache.flink.client.program.ProgramInvocationException: Could not build up 
connection to JobManager.
at org.apache.flink.client.program.Client.run(Client.java:327)
at org.apache.flink.client.program.Client.run(Client.java:306)
at org.apache.flink.client.program.Client.run(Client.java:300)
at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
at 
org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:250)
at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
Caused by: java.io.IOException: JobManager at 
akka.tcp://flink@10.216.177.146:6123/user/jobmanager not reachable. Please make 
sure that the JobManager is running and its port is reachable.
at 
org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
at 
org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
at 
org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
at 
org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
at 
org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
at org.apache.flink.client.program.Client.run(Client.java:322)
... 15 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
[1 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at 
org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
... 20 more

The exception above occurred while trying to run your command.


 On Feb 25, 2015, at 1:29 AM, Stephan Ewen se...@apache.org wrote:
 
 BTW: Does still work if you enter localhost for jobmanager.rpc.address
 in your flink-conf.yaml ?
 
 On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen se...@apache.org wrote:
 
 Hi!
 
 I think that this is a problem in the current master (probably in there
 since a few days ago). I am fixing it...
 
 Thanks for reporting it!
 
 Stephan
 
 
 On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen se...@apache.org wrote:
 
 Hi Dulaj!
 
 The log suggests that the JobManager binds itself to the IP
 address 10.216.192.98 and the WebClient runs at 127.0.0.1
 
 The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
 
 Let me verify whether this is a quirk of your particular setup, or a bug
 recently introduces in the 0.9-SNAPSHOT.
 
 Does the command line work for you? (bin/flink run jar)
 
 taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that the
 default of '1' is used.
 
 Greetings,
 Stephan
 
 
 
 On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 Is taskmanager.numberOfTaskSlots: -1 normal?
 
 On Feb 24, 2015, at 9:44 PM, Robert Metzger rmetz...@apache.org
 wrote:
 
 Hi,
 I could not find the logfiles attached to your mails. I think 

Re: Could not build up connection to JobManager

2015-02-25 Thread Stephan Ewen
Addition: To check whether a port is reachable, I think the easiest thing
is to try and connect with a telnet client and see if the connection is
refused.

On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen se...@apache.org wrote:

 Okay, the problem seems to be that even though both the client and the
 jobmanager use localhost as the host name, they resolve this to different
 IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146

  Also, the 127.0.0.1 address cannot communicate to 10.216.177.146
 apparently.

 Can you help us debug this by checking the following:

  - Can you try and set jobmanager.rpc.address to 127.0.0.1 and see if
 that solves it?
  - Can you try and set jobmanager.rpc.address to the other address 
 (10.216.177.146
 or so) and see if that solves it?
  - Can you do start-cluster.sh, rather than start-local.sh and see
 whether the webfrontend displays that the TaskManager connects?
  - As a hard core test: Can you bring up the jobmanager, check where it
 connects (10.216.192.98:6123 or so) and see whether the port is reachable?

 We have recently updated how the Akka URLs are build, to work around a
 limitation in Akka. Seems that did not yet fully solve the issue.

 Thanks for helping us debug this, it is not the easiest immigration
 experience, but the outcome is probably extremely valuable for the project
 :-)

 Greetings,
 Stephan


 On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:

 Hi,
 Sorry for the delay to reply on this issue.
 the jobmanager.rpc.address is set to “localhost” already in conf.yaml.
 This can’t be an issue because the job manager web interface works fine
 which also runs on localhost

  bin/flink run jar doesn’t seem to work either. Let me send you my
 command and the result in terminal.

 bin/flink run
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
 $FLINK_DIRECTORY/count

 20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
 builtin-java classes where applicable
 org.apache.flink.client.program.ProgramInvocationException: Could not
 build up connection to JobManager.
 at org.apache.flink.client.program.Client.run(Client.java:327)
 at org.apache.flink.client.program.Client.run(Client.java:306)
 at org.apache.flink.client.program.Client.run(Client.java:300)
 at
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
 at
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
 at
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
 at org.apache.flink.client.program.Client.run(Client.java:250)
 at
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
 at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
 at
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
 at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
 Caused by: java.io.IOException: JobManager at akka.tcp://
 flink@10.216.177.146:6123/user/jobmanager not reachable. Please make
 sure that the JobManager is running and its port is reachable.
 at
 org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
 at
 org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
 at
 org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
 at
 org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
 at
 org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
 at org.apache.flink.client.program.Client.run(Client.java:322)
 ... 15 more
 Caused by: java.util.concurrent.TimeoutException: Futures timed out after
 [1 milliseconds]
 at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 at
 scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 at 

Re: Could not build up connection to JobManager

2015-02-25 Thread Stephan Ewen
Okay, the problem seems to be that even though both the client and the
jobmanager use localhost as the host name, they resolve this to different
IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146

 Also, the 127.0.0.1 address cannot communicate to 10.216.177.146
apparently.

Can you help us debug this by checking the following:

 - Can you try and set jobmanager.rpc.address to 127.0.0.1 and see if
that solves it?
 - Can you try and set jobmanager.rpc.address to the other address
(10.216.177.146
or so) and see if that solves it?
 - Can you do start-cluster.sh, rather than start-local.sh and see
whether the webfrontend displays that the TaskManager connects?
 - As a hard core test: Can you bring up the jobmanager, check where it
connects (10.216.192.98:6123 or so) and see whether the port is reachable?

We have recently updated how the Akka URLs are build, to work around a
limitation in Akka. Seems that did not yet fully solve the issue.

Thanks for helping us debug this, it is not the easiest immigration
experience, but the outcome is probably extremely valuable for the project
:-)

Greetings,
Stephan


On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga vidura...@icloud.com
wrote:

 Hi,
 Sorry for the delay to reply on this issue.
 the jobmanager.rpc.address is set to “localhost” already in conf.yaml.
 This can’t be an issue because the job manager web interface works fine
 which also runs on localhost

  bin/flink run jar doesn’t seem to work either. Let me send you my
 command and the result in terminal.

 bin/flink run
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
 /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
 $FLINK_DIRECTORY/count

 20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
 builtin-java classes where applicable
 org.apache.flink.client.program.ProgramInvocationException: Could not
 build up connection to JobManager.
 at org.apache.flink.client.program.Client.run(Client.java:327)
 at org.apache.flink.client.program.Client.run(Client.java:306)
 at org.apache.flink.client.program.Client.run(Client.java:300)
 at
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
 at
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
 at
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
 at org.apache.flink.client.program.Client.run(Client.java:250)
 at
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
 at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
 at
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
 at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
 Caused by: java.io.IOException: JobManager at akka.tcp://
 flink@10.216.177.146:6123/user/jobmanager not reachable. Please make sure
 that the JobManager is running and its port is reachable.
 at
 org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
 at
 org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
 at
 org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
 at
 org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
 at
 org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
 at org.apache.flink.client.program.Client.run(Client.java:322)
 ... 15 more
 Caused by: java.util.concurrent.TimeoutException: Futures timed out after
 [1 milliseconds]
 at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 at
 scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 at scala.concurrent.Await$.result(package.scala:107)
 at
 org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
 ... 20 more

 The exception above occurred while trying to run your command.


  On Feb 25, 2015, 

Re: Could not build up connection to JobManager

2015-02-24 Thread Stephan Ewen
BTW: Does still work if you enter localhost for jobmanager.rpc.address
in your flink-conf.yaml ?

On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen se...@apache.org wrote:

 Hi!

 I think that this is a problem in the current master (probably in there
 since a few days ago). I am fixing it...

 Thanks for reporting it!

 Stephan


 On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen se...@apache.org wrote:

 Hi Dulaj!

 The log suggests that the JobManager binds itself to the IP
 address 10.216.192.98 and the WebClient runs at 127.0.0.1

 The 127.0.0.1 actor system cannot connect to the 10.216.192.98.

 Let me verify whether this is a quirk of your particular setup, or a bug
 recently introduces in the 0.9-SNAPSHOT.

 Does the command line work for you? (bin/flink run jar)

 taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that the
 default of '1' is used.

 Greetings,
 Stephan



 On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:

 Is taskmanager.numberOfTaskSlots: -1 normal?

  On Feb 24, 2015, at 9:44 PM, Robert Metzger rmetz...@apache.org
 wrote:
 
  Hi,
  I could not find the logfiles attached to your mails. I think the
  mailinglists are not accepting attachments.
  Can you put the logs on gist.github.com?
 
  The configuration values are documented here:
  http://flink.apache.org/docs/0.8/config.html
  For the webclient's port its called webclient.port
 
  On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga vidura...@icloud.com
 
  wrote:
 
  I tried to kill the job manager manually in the terminal and start it
  again but no luck. Also could you tell me if it’s possible to change
  webclient’s port (8080) ?
 
  On Feb 24, 2015, at 1:41 PM, Stephan Ewen se...@apache.org wrote:
 
  Hey Dulaj!
 
  As a contributor, I would go against the latest version, which is
  0.9-SNAPSHOT.
 
  It may be in your case that the JobManager actor is down, but the
 process
  still lingers. (BTW: I have a patch pending that makes sure the
 process
  disappears when the actor via down).
 
  Could you have a look at the log
 flink-user-jobmanager-host-.log
  and
  see if there are any errors logged?
 
  Greetings,
  Stephan
  Am 24.02.2015 06:29 schrieb Dulaj Viduranga vidura...@icloud.com
 :
 
  The JobManager seems to run fine. I don't know. When I tried to run
  start-local.sh again, It shows the PID of the running JobManager and
  also
  :8081 runs fine. I want to contribute to the project and I could
 get a
  little boost if I could see the capabilities of FLINK. :)
  Will it be OK to use 0.8.1 as a developer?
 
  On Feb 24, 2015, at 04:15 AM, Stephan Ewen se...@apache.org
 wrote:
 
  Hi Dulaj,
 
  That error message indicates that the JobManager is not running.
 Are you
  sure that the JobManager runs properly? Anything in the JobManager
 logs?
 
  BTW: The 0.9 branch is under heavy development / changes. That is
 why it
  may behave a bit different on different days right now. I would
  recommend
  to use the 0.8.1 release for a stable experience.
 
  Greetings,
  Stephan
 
 
  On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger 
 rmetz...@apache.org
  wrote:
 
  Thank you for the quick reply.
 
  The log you've send is from the webclient. Can you also send the
 log of
  the
 
  JobManager?
 
  On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga 
 vidura...@icloud.com
 
  wrote:
 
  Yes. It seams it is not a problem with the arguments. I tried two
 days
 
  but
 
  different error occurs. It seams the web client can’t connect to
 the
  job
 
  manager although it is running
 
  Right now, I can’t even get the webclient to run.
 
  ./bin/start-webclient.sh
 
  executes fine but I cannot connect to localhost:8080 (even with
 telnet
  or
 
  curl)
 
  Here is the log for jobManager
 
 
 
  23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
 
  - Setting up web frontend server, using web-root directory
 
 
 
  'jar:
 
 
 file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
  '.
 
  23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
 
  - Web frontend server will store temporary files in
 
  '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T', uploaded jobs
 in
 
  '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-jobs',
 
  plan-json-dumps in
 
  '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-plans'.
 
  23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
 
  - Web-frontend will submit jobs to nephele job-manager on
 
  localhost,
 
  port 6123.
 
  23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
 
  - Slf4jLogger started
 
  23:22:32,625 INFO Remoting
 
  - Starting remoting
 
  23:22:32,838 INFO Remoting
 
  - Remoting started; listening on addresses :[akka.tcp://
 
 
  JobsInfoServletActorSystem@127.0.0.1:51517]
 
  23:23:48,119 WARN Remoting
 
  - Tried to associate with unreachable remote address [akka.tcp://
 
 
  

Re: Could not build up connection to JobManager

2015-02-24 Thread Robert Metzger
Hi,
I could not find the logfiles attached to your mails. I think the
mailinglists are not accepting attachments.
Can you put the logs on gist.github.com?

The configuration values are documented here:
http://flink.apache.org/docs/0.8/config.html
For the webclient's port its called webclient.port

On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga vidura...@icloud.com
wrote:

 I tried to kill the job manager manually in the terminal and start it
 again but no luck. Also could you tell me if it’s possible to change
 webclient’s port (8080) ?

  On Feb 24, 2015, at 1:41 PM, Stephan Ewen se...@apache.org wrote:
 
  Hey Dulaj!
 
  As a contributor, I would go against the latest version, which is
  0.9-SNAPSHOT.
 
  It may be in your case that the JobManager actor is down, but the process
  still lingers. (BTW: I have a patch pending that makes sure the process
  disappears when the actor via down).
 
  Could you have a look at the log flink-user-jobmanager-host-.log
 and
  see if there are any errors logged?
 
  Greetings,
  Stephan
  Am 24.02.2015 06:29 schrieb Dulaj Viduranga vidura...@icloud.com:
 
  The JobManager seems to run fine. I don't know. When I tried to run
  start-local.sh again, It shows the PID of the running JobManager and
 also
  :8081 runs fine. I want to contribute to the project and I could get a
  little boost if I could see the capabilities of FLINK. :)
  Will it be OK to use 0.8.1 as a developer?
 
  On Feb 24, 2015, at 04:15 AM, Stephan Ewen se...@apache.org wrote:
 
  Hi Dulaj,
 
  That error message indicates that the JobManager is not running. Are you
  sure that the JobManager runs properly? Anything in the JobManager logs?
 
  BTW: The 0.9 branch is under heavy development / changes. That is why it
  may behave a bit different on different days right now. I would
 recommend
  to use the 0.8.1 release for a stable experience.
 
  Greetings,
  Stephan
 
 
  On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger rmetz...@apache.org
  wrote:
 
  Thank you for the quick reply.
 
  The log you've send is from the webclient. Can you also send the log of
 the
 
  JobManager?
 
  On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga vidura...@icloud.com
 
  wrote:
 
  Yes. It seams it is not a problem with the arguments. I tried two days
 
  but
 
  different error occurs. It seams the web client can’t connect to the
 job
 
  manager although it is running
 
  Right now, I can’t even get the webclient to run.
 
  ./bin/start-webclient.sh
 
  executes fine but I cannot connect to localhost:8080 (even with telnet
 or
 
  curl)
 
  Here is the log for jobManager
 
 
 
  23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
 
  - Setting up web frontend server, using web-root directory
 
 
 
  'jar:
 
 file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
  '.
 
  23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
 
  - Web frontend server will store temporary files in
 
  '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T', uploaded jobs in
 
  '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-jobs',
 
  plan-json-dumps in
 
  '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-plans'.
 
  23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
 
  - Web-frontend will submit jobs to nephele job-manager on
 
  localhost,
 
  port 6123.
 
  23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
 
  - Slf4jLogger started
 
  23:22:32,625 INFO Remoting
 
  - Starting remoting
 
  23:22:32,838 INFO Remoting
 
  - Remoting started; listening on addresses :[akka.tcp://
 
 
  JobsInfoServletActorSystem@127.0.0.1:51517]
 
  23:23:48,119 WARN Remoting
 
  - Tried to associate with unreachable remote address [akka.tcp://
 
 
  flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
 
  messages
 
  to this address will be delivered to dead letters. Reason: Operation
 
  timed
 
  out: /10.218.98.169:6123
 
  23:23:48,124 ERROR org.apache.flink.client.WebFrontend
 
  - Unexpected exception: Could not find job manager at specified
 
  address akka.flink@10.218.98.169:6123/user/jobmanager'tcp://
  flink@10.218.98.169:6123/user/jobmanager.
 
  java.lang.RuntimeException: Could not find job manager at specified
 
  address akka.flink@10.218.98.169:6123/user/jobmanager'tcp://
  flink@10.218.98.169:6123/user/jobmanager.
 
  at
 
 
 
 
 org.apache.flink.client.web.JobsInfoServlet.init(JobsInfoServlet.java:82)
 
  at
 
 
 
 
 
 org.apache.flink.client.web.WebInterfaceServer.init(WebInterfaceServer.java:158)
 
  at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
 
 
 
 
 
  On Feb 23, 2015, at 11:46 PM, Robert Metzger rmetz...@apache.org
 
  wrote:
 
 
 
  Hi,
 
  you said in the other email thread that the error only occurs for
 
  Wordcount, not for Kmeans.
 
  Can you copy me the commands for both examples?
 
  I can not really believe that there is a difference 

Re: Could not build up connection to JobManager

2015-02-24 Thread Dulaj Viduranga
Is taskmanager.numberOfTaskSlots: -1 normal?

 On Feb 24, 2015, at 9:44 PM, Robert Metzger rmetz...@apache.org wrote:
 
 Hi,
 I could not find the logfiles attached to your mails. I think the
 mailinglists are not accepting attachments.
 Can you put the logs on gist.github.com?
 
 The configuration values are documented here:
 http://flink.apache.org/docs/0.8/config.html
 For the webclient's port its called webclient.port
 
 On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:
 
 I tried to kill the job manager manually in the terminal and start it
 again but no luck. Also could you tell me if it’s possible to change
 webclient’s port (8080) ?
 
 On Feb 24, 2015, at 1:41 PM, Stephan Ewen se...@apache.org wrote:
 
 Hey Dulaj!
 
 As a contributor, I would go against the latest version, which is
 0.9-SNAPSHOT.
 
 It may be in your case that the JobManager actor is down, but the process
 still lingers. (BTW: I have a patch pending that makes sure the process
 disappears when the actor via down).
 
 Could you have a look at the log flink-user-jobmanager-host-.log
 and
 see if there are any errors logged?
 
 Greetings,
 Stephan
 Am 24.02.2015 06:29 schrieb Dulaj Viduranga vidura...@icloud.com:
 
 The JobManager seems to run fine. I don't know. When I tried to run
 start-local.sh again, It shows the PID of the running JobManager and
 also
 :8081 runs fine. I want to contribute to the project and I could get a
 little boost if I could see the capabilities of FLINK. :)
 Will it be OK to use 0.8.1 as a developer?
 
 On Feb 24, 2015, at 04:15 AM, Stephan Ewen se...@apache.org wrote:
 
 Hi Dulaj,
 
 That error message indicates that the JobManager is not running. Are you
 sure that the JobManager runs properly? Anything in the JobManager logs?
 
 BTW: The 0.9 branch is under heavy development / changes. That is why it
 may behave a bit different on different days right now. I would
 recommend
 to use the 0.8.1 release for a stable experience.
 
 Greetings,
 Stephan
 
 
 On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger rmetz...@apache.org
 wrote:
 
 Thank you for the quick reply.
 
 The log you've send is from the webclient. Can you also send the log of
 the
 
 JobManager?
 
 On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga vidura...@icloud.com
 
 wrote:
 
 Yes. It seams it is not a problem with the arguments. I tried two days
 
 but
 
 different error occurs. It seams the web client can’t connect to the
 job
 
 manager although it is running
 
 Right now, I can’t even get the webclient to run.
 
 ./bin/start-webclient.sh
 
 executes fine but I cannot connect to localhost:8080 (even with telnet
 or
 
 curl)
 
 Here is the log for jobManager
 
 
 
 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
 
 - Setting up web frontend server, using web-root directory
 
 
 
 'jar:
 
 file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
 '.
 
 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
 
 - Web frontend server will store temporary files in
 
 '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T', uploaded jobs in
 
 '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-jobs',
 
 plan-json-dumps in
 
 '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-plans'.
 
 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
 
 - Web-frontend will submit jobs to nephele job-manager on
 
 localhost,
 
 port 6123.
 
 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
 
 - Slf4jLogger started
 
 23:22:32,625 INFO Remoting
 
 - Starting remoting
 
 23:22:32,838 INFO Remoting
 
 - Remoting started; listening on addresses :[akka.tcp://
 
 
 JobsInfoServletActorSystem@127.0.0.1:51517]
 
 23:23:48,119 WARN Remoting
 
 - Tried to associate with unreachable remote address [akka.tcp://
 
 
 flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
 
 messages
 
 to this address will be delivered to dead letters. Reason: Operation
 
 timed
 
 out: /10.218.98.169:6123
 
 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
 
 - Unexpected exception: Could not find job manager at specified
 
 address akka.flink@10.218.98.169:6123/user/jobmanager'tcp://
 flink@10.218.98.169:6123/user/jobmanager.
 
 java.lang.RuntimeException: Could not find job manager at specified
 
 address akka.flink@10.218.98.169:6123/user/jobmanager'tcp://
 flink@10.218.98.169:6123/user/jobmanager.
 
 at
 
 
 
 
 org.apache.flink.client.web.JobsInfoServlet.init(JobsInfoServlet.java:82)
 
 at
 
 
 
 
 
 org.apache.flink.client.web.WebInterfaceServer.init(WebInterfaceServer.java:158)
 
 at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
 
 
 
 
 
 On Feb 23, 2015, at 11:46 PM, Robert Metzger rmetz...@apache.org
 
 wrote:
 
 
 
 Hi,
 
 you said in the other email thread that the error only occurs for
 
 Wordcount, not for Kmeans.
 
 Can you copy me the commands for both examples?
 
 I can 

Re: Could not build up connection to JobManager

2015-02-24 Thread Stephan Ewen
Hi!

I think that this is a problem in the current master (probably in there
since a few days ago). I am fixing it...

Thanks for reporting it!

Stephan


On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen se...@apache.org wrote:

 Hi Dulaj!

 The log suggests that the JobManager binds itself to the IP
 address 10.216.192.98 and the WebClient runs at 127.0.0.1

 The 127.0.0.1 actor system cannot connect to the 10.216.192.98.

 Let me verify whether this is a quirk of your particular setup, or a bug
 recently introduces in the 0.9-SNAPSHOT.

 Does the command line work for you? (bin/flink run jar)

 taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that the
 default of '1' is used.

 Greetings,
 Stephan



 On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga vidura...@icloud.com
 wrote:

 Is taskmanager.numberOfTaskSlots: -1 normal?

  On Feb 24, 2015, at 9:44 PM, Robert Metzger rmetz...@apache.org
 wrote:
 
  Hi,
  I could not find the logfiles attached to your mails. I think the
  mailinglists are not accepting attachments.
  Can you put the logs on gist.github.com?
 
  The configuration values are documented here:
  http://flink.apache.org/docs/0.8/config.html
  For the webclient's port its called webclient.port
 
  On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga vidura...@icloud.com
  wrote:
 
  I tried to kill the job manager manually in the terminal and start it
  again but no luck. Also could you tell me if it’s possible to change
  webclient’s port (8080) ?
 
  On Feb 24, 2015, at 1:41 PM, Stephan Ewen se...@apache.org wrote:
 
  Hey Dulaj!
 
  As a contributor, I would go against the latest version, which is
  0.9-SNAPSHOT.
 
  It may be in your case that the JobManager actor is down, but the
 process
  still lingers. (BTW: I have a patch pending that makes sure the
 process
  disappears when the actor via down).
 
  Could you have a look at the log flink-user-jobmanager-host-.log
  and
  see if there are any errors logged?
 
  Greetings,
  Stephan
  Am 24.02.2015 06:29 schrieb Dulaj Viduranga vidura...@icloud.com:
 
  The JobManager seems to run fine. I don't know. When I tried to run
  start-local.sh again, It shows the PID of the running JobManager and
  also
  :8081 runs fine. I want to contribute to the project and I could get
 a
  little boost if I could see the capabilities of FLINK. :)
  Will it be OK to use 0.8.1 as a developer?
 
  On Feb 24, 2015, at 04:15 AM, Stephan Ewen se...@apache.org wrote:
 
  Hi Dulaj,
 
  That error message indicates that the JobManager is not running. Are
 you
  sure that the JobManager runs properly? Anything in the JobManager
 logs?
 
  BTW: The 0.9 branch is under heavy development / changes. That is
 why it
  may behave a bit different on different days right now. I would
  recommend
  to use the 0.8.1 release for a stable experience.
 
  Greetings,
  Stephan
 
 
  On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger rmetz...@apache.org
 
  wrote:
 
  Thank you for the quick reply.
 
  The log you've send is from the webclient. Can you also send the log
 of
  the
 
  JobManager?
 
  On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga 
 vidura...@icloud.com
 
  wrote:
 
  Yes. It seams it is not a problem with the arguments. I tried two
 days
 
  but
 
  different error occurs. It seams the web client can’t connect to the
  job
 
  manager although it is running
 
  Right now, I can’t even get the webclient to run.
 
  ./bin/start-webclient.sh
 
  executes fine but I cannot connect to localhost:8080 (even with
 telnet
  or
 
  curl)
 
  Here is the log for jobManager
 
 
 
  23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
 
  - Setting up web frontend server, using web-root directory
 
 
 
  'jar:
 
 
 file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
  '.
 
  23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
 
  - Web frontend server will store temporary files in
 
  '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T', uploaded jobs in
 
  '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-jobs',
 
  plan-json-dumps in
 
  '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2cgn/T/webclient-plans'.
 
  23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
 
  - Web-frontend will submit jobs to nephele job-manager on
 
  localhost,
 
  port 6123.
 
  23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
 
  - Slf4jLogger started
 
  23:22:32,625 INFO Remoting
 
  - Starting remoting
 
  23:22:32,838 INFO Remoting
 
  - Remoting started; listening on addresses :[akka.tcp://
 
 
  JobsInfoServletActorSystem@127.0.0.1:51517]
 
  23:23:48,119 WARN Remoting
 
  - Tried to associate with unreachable remote address [akka.tcp://
 
 
  flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
 
  messages
 
  to this address will be delivered to dead letters. Reason: Operation
 
  timed
 
  out: /10.218.98.169:6123
 
  

Re: Could not build up connection to JobManager

2015-02-23 Thread Robert Metzger
Hi,
you said in the other email thread that the error only occurs for
Wordcount, not for Kmeans.
Can you copy me the commands for both examples?
I can not really believe that there is a difference between the two jobs.

Can you also send us the contents of the jobmanager log file?

Best,
Robert


On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga vidura...@icloud.com
wrote:

 I’m getting Could not build up connection to JobManager.” When i tried to
 run the wordCount example. Can anyone help?

 Dulaj