Re: ResourceManager not using correct akka URI in standalone cluster (?)

2016-09-28 Thread AJ Heller
Thank you Till. I was in a time crunch, and rebuilt my cluster from the
ground up with hadoop installed. All works fine now, `netstat -pn | grep
6123` shows flink's pid. Hadoop may be irrelevant, I can't rule out PEBKAC
yet :-). Sorry, when I have time I'll attempt to reproduce the scenario, on
the off chance there's a bug in there I can help dig up.

Best,
aj


Re: ResourceManager not using correct akka URI in standalone cluster (?)

2016-09-20 Thread Till Rohrmann
Hi,

could you check what happened to your TaskManagers in the logs? There seems
to be a problem with the connection of the TMs to the JM.

You're right that you don't strictly need HDFS to run a Flink job as long
as you don't want to access HDFS data or write to HDFS.

`netstat -atn` should list you all tcp sockets currently used. A socket
bound to port 6123 should be among them.

Cheers,
Till

On Thu, Sep 15, 2016 at 11:20 PM, AJ Heller  wrote:

> More information:
>
> From the master node, I cannot `telnet localhost 6123` nor `telnet  IP> 6123` while the cluster is apparently running. Connection refused
> immediately. `netstat -n | grep 6123` is empty. There's no server
> listening. But the processes are running on all machines.
>
> Does it matter that I don't have hadoop or HDFS installed? It is optional,
> right? To be clear, this fails at startup, long before I'm able to run any
> job.
>
> On Amazon EC2, the machines know of their private IPs, but not their
> public IPs. I've instructed the cluster to operate over the public network
> because I couldn't get the private IP scenario working.
>
> Running `./bin/start-local.sh` shows non-zero counts in the Flink
> Dashboard. Cluster setups show zero-counts all around.
>
> -aj
>
> On Thu, Sep 15, 2016 at 12:41 PM, AJ Heller  wrote:
>
>> I'm running a standalone cluster on Amazon EC2. Leader election is
>> happening according to the logs, and the Flink Dashboard is up and running,
>> accessible remotely. The issue I'm having is that the SocketWordCount
>> example is not working, the local connection is being refused!
>>
>> In the Flink Dashboard, 0 task managers are being reported. And in the
>> jobmanager logs, the last line indicates "leader session null". All other
>> akka URIs in the log file begin "akka.tcp://flink@PUBLIC_IP/...", but
>> the Resourse Manager URI indicated "akka://flink/...".
>>
>>
>> jobmanager log:
>> http://pastebin.com/VWJM8XvW
>>
>> client log:
>> http://pastebin.com/ZrWsbcwa
>>
>> flink-conf.yaml:
>> http://pastebin.com/xy2tz7WS
>>
>> master and slave files are populated with public ips as well.
>>
>
>


Re: ResourceManager not using correct akka URI in standalone cluster (?)

2016-09-15 Thread AJ Heller
More information:

>From the master node, I cannot `telnet localhost 6123` nor `telnet  6123` while the cluster is apparently running. Connection refused
immediately. `netstat -n | grep 6123` is empty. There's no server
listening. But the processes are running on all machines.

Does it matter that I don't have hadoop or HDFS installed? It is optional,
right? To be clear, this fails at startup, long before I'm able to run any
job.

On Amazon EC2, the machines know of their private IPs, but not their public
IPs. I've instructed the cluster to operate over the public network because
I couldn't get the private IP scenario working.

Running `./bin/start-local.sh` shows non-zero counts in the Flink
Dashboard. Cluster setups show zero-counts all around.

-aj

On Thu, Sep 15, 2016 at 12:41 PM, AJ Heller  wrote:

> I'm running a standalone cluster on Amazon EC2. Leader election is
> happening according to the logs, and the Flink Dashboard is up and running,
> accessible remotely. The issue I'm having is that the SocketWordCount
> example is not working, the local connection is being refused!
>
> In the Flink Dashboard, 0 task managers are being reported. And in the
> jobmanager logs, the last line indicates "leader session null". All other
> akka URIs in the log file begin "akka.tcp://flink@PUBLIC_IP/...", but the
> Resourse Manager URI indicated "akka://flink/...".
>
>
> jobmanager log:
> http://pastebin.com/VWJM8XvW
>
> client log:
> http://pastebin.com/ZrWsbcwa
>
> flink-conf.yaml:
> http://pastebin.com/xy2tz7WS
>
> master and slave files are populated with public ips as well.
>


ResourceManager not using correct akka URI in standalone cluster (?)

2016-09-15 Thread AJ Heller
I'm running a standalone cluster on Amazon EC2. Leader election is
happening according to the logs, and the Flink Dashboard is up and running,
accessible remotely. The issue I'm having is that the SocketWordCount
example is not working, the local connection is being refused!

In the Flink Dashboard, 0 task managers are being reported. And in the
jobmanager logs, the last line indicates "leader session null". All other
akka URIs in the log file begin "akka.tcp://flink@PUBLIC_IP/...", but the
Resourse Manager URI indicated "akka://flink/...".


jobmanager log:
http://pastebin.com/VWJM8XvW

client log:
http://pastebin.com/ZrWsbcwa

flink-conf.yaml:
http://pastebin.com/xy2tz7WS

master and slave files are populated with public ips as well.