Re: Issue with running Flink Python jobs on cluster

2016-07-19 Thread Maximilian Michels
Hi! HDFS is mentioned in the docs but not explicitly listed as a requirement: https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/python.html#project-setup I suppose the Python API could also distribute its libraries through Flink's BlobServer. Cheers, Max On Tue, Jul 19, 2016 at

Re: Issue with running Flink Python jobs on cluster

2016-07-19 Thread Chesnay Schepler
Glad to hear it! The HDFS requirement should most definitely be documented; i assumed it already was actually... On 19.07.2016 03:42, Geoffrey Mon wrote: Hello Chesnay, Thank you very much! With your help I've managed to set up a Flink cluster that can run Python jobs successfully. I solved

Re: Issue with running Flink Python jobs on cluster

2016-07-18 Thread Geoffrey Mon
Hello Chesnay, Thank you very much! With your help I've managed to set up a Flink cluster that can run Python jobs successfully. I solved my issue by removing local=True and installing HDFS in a separate cluster. I don't think it was clearly mentioned in the documentation that HDFS was required

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Chesnay Schepler
well now i know what the problem could be. You are trying to execute a job on a cluster (== not local), but have set the local flag to true. env.execute(local=True) Due to this flag the files are only copied into the tmp directory of the node where you execute the plan, and are thus not

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Geoffrey Mon
I haven't yet figured out how to write a Java job to test DistributedCache functionality between machines; I've only gotten worker nodes to create caches from local files (on the same worker nodes), rather than on files from the master node. The DistributedCache test I've been using (based on the

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Chesnay Schepler
Please also post the job you're trying to run. On 17.07.2016 08:43, Geoffrey Mon wrote: The Java program I used to test DistributedCache was faulty since it actually created the cache from files on the machine on which the program was running (i.e. the worker node). I tried implementing a

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Chesnay Schepler
Does this mean the revised DistributedCache job run successfully? On 17.07.2016 08:43, Geoffrey Mon wrote: The Java program I used to test DistributedCache was faulty since it actually created the cache from files on the machine on which the program was running (i.e. the worker node). I

Re: Issue with running Flink Python jobs on cluster

2016-07-15 Thread Geoffrey Mon
I wrote a simple Java plan that reads a file in the distributed cache and uses the first line from that file in a map operation. Sure enough, it works locally, but fails when the job is sent to a taskmanager on a worker node. Since DistributedCache seems to work for everyone else, I'm thinking

Re: Issue with running Flink Python jobs on cluster

2016-07-15 Thread Chesnay Schepler
Could you write a java job that uses the Distributed cache to distribute files? If this fails then the DC is faulty, if it doesn't something in the Python API is wrong. On 15.07.2016 08:06, Geoffrey Mon wrote: I've come across similar issues when trying to set up Flink on Amazon EC2

Re: Issue with running Flink Python jobs on cluster

2016-07-15 Thread Geoffrey Mon
I've come across similar issues when trying to set up Flink on Amazon EC2 instances. Presumably there is something wrong with my setup? Here is the flink-conf.yaml I am using: https://gist.githubusercontent.com/GEOFBOT/3ffc9b21214174ae750cc3fdb2625b71/raw/flink-conf.yaml Thanks, Geoffrey On Wed,

Re: Issue with running Flink Python jobs on cluster

2016-07-13 Thread Geoffrey Mon
Hello, Here is the TaskManager log on pastebin: http://pastebin.com/XAJ56gn4 I will look into whether the files were created. By the way, the cluster is made with virtual machines running on BlueData EPIC. I don't know if that might be related to the problem. Thanks, Geoffrey On Wed, Jul 13,

Re: Issue with running Flink Python jobs on cluster

2016-07-13 Thread Chesnay Schepler
Hello Geoffrey, How often does this occur? Flink distributes the user-code and the python library using the Distributed Cache. Either the file is deleted right after being created for some reason, or the DC returns a file name before the file was created (which shouldn't happen, it should

Issue with running Flink Python jobs on cluster

2016-07-12 Thread Geoffrey Mon
Hello all, I've set up Flink on a very small cluster of one master node and five worker nodes, following the instructions in the documentation ( https://ci.apache.org/projects/flink/flink-docs-master/setup/cluster_setup.html). I can run the included examples like WordCount and PageRank across the