See inline On Tue, Nov 25, 2014 at 3:37 PM, Robert Metzger <[email protected]> wrote:
> Hey, > > maybe we need to go a step back because I did not yet fully understand > what you want to do. > > My understanding so far is the following: > - You have a set of jobs that you've written for Flink > Yes, and they are all in the same jar (that I want to put in the cluster somehow) - You have a cluster with Flink running > Yes! > - You have an external client, which is a Java Application that is > controlling when and how the different jobs are launched. The client is > running basically 24/7 or started by a cronjob. > I have a Java application somewhere that triggers the execution of one of the available jobs in the jar (so I need to pass also the necessary arguments required by each job) and then monitor if the job has been put into a running state and its status (running/failed/finished and percentage would be awesome). I don't think RemoteExecutor is enough..am I wrong? > Correct me if these assumptions are wrong. If they are true, the > RemoteExecutor is probably what you are looking for. Otherwise, we have to > find another solution. > > > On Tue, Nov 25, 2014 at 2:56 PM, Flavio Pompermaier <[email protected]> > wrote: > >> Hi Robert, >> I tried to look at the RemoteExecutor but I can't understand what are the >> exact steps to: >> 1 - (upload if necessary and) register a jar containing multiple main >> methods (one for each job) >> 2 - start the execution of a job from a client >> 3 - monitor the execution of the job >> >> Could you give me the exact java commands/snippets to do that? >> >> >> >> On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <[email protected]> >> wrote: >> >>> +1 for providing some utilities/tools for application developers. >>> This could include something like an application registry. I also think >>> that almost every user needs something to parse command line arguments >>> (including default values and comprehensive error messages). >>> We should also see if we can document and properly expose the FileSystem >>> abstraction to Flink app programmers. Users sometimes need to do manipulate >>> files directly. >>> >>> >>> Regarding your second question: >>> For deploying a jar on your cluster, you can use the "bin/flink run <JAR >>> FILE>" command. >>> For starting a Job from an external client you can use the >>> RemoteExecutionEnvironment (you need to know the JobManager address for >>> that). Here is some documentation on that: >>> http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment >>> >>> >>> >>> >>> >>> >>> >>> On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier < >>> [email protected]> wrote: >>> >>>> That was exactly what I was looking for. In my case it is not a problem >>>> to use hadoop version because I work on Hadoop. Don't you think it could be >>>> useful to add a Flink ProgramDriver so that you can use it both for hadoop >>>> and native-flink jobs? >>>> >>>> Now that I understood how to bundle together a bunch of jobs, my next >>>> objective will be to deploy the jar on the cluster (similarity to what tge >>>> webclient does) and then start the jobs from my external client (which in >>>> theory just need to know the jar name and the parameters to pass to every >>>> job it wants to call). Do you have an example of that? >>>> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <[email protected]> wrote: >>>> >>>>> Are you looking for something like >>>>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html >>>>> ? >>>>> >>>>> You should be able to use the Hadoop ProgramDriver directly, see for >>>>> example here: >>>>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java >>>>> >>>>> If you don't want to introduce a Hadoop dependency in your project, >>>>> you can just copy-paste ProgramDriver, it does not have any dependencies >>>>> to >>>>> Hadoop classes. That class just accumulates <String,Class> pairs >>>>> (simplifying a bit) and calls the main method of the corresponding class. >>>>> >>>>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <[email protected]> >>>>> wrote: >>>>> >>>>>> Not sure I get exactly what this is, but packaging multiple examples >>>>>> in one program is well possible. You can have arbitrary control flow in >>>>>> the >>>>>> main() method. >>>>>> >>>>>> Should be well possible to do something like that hadoop examples >>>>>> setup... >>>>>> >>>>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> That was something I used to do with hadoop and it's comfortable >>>>>>> when testing stuff (so it is not so important). >>>>>>> For an example see what happens when you run the old "hadoop jar >>>>>>> hadoop-mapreduce-examples.jar" command..it "drives" you to the correct >>>>>>> invokation of that job. >>>>>>> However, the important thing is that I'd like to keep existing >>>>>>> related jobs somewhere (like a repository of jobs), deploy them and >>>>>>> then be >>>>>>> able to start the one I need from an external program. >>>>>>> >>>>>>> Could this be done with RemoteExecutor? Or is there any WS to >>>>>>> manage the job execution? That would be very useful.. >>>>>>> Is the Client interface the only one that allow something similar >>>>>>> right now? >>>>>>> >>>>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I am not sure exactly what you need there. In Flink you can write >>>>>>>> more than one program in the same program ;-) You can define complex >>>>>>>> flows >>>>>>>> and execute arbitrarily at intermediate points: >>>>>>>> >>>>>>>> main() { >>>>>>>> ExecutionEnvironment env = ...; >>>>>>>> >>>>>>>> env.readSomething().map().join(...).and().so().on(); >>>>>>>> env.execute(); >>>>>>>> >>>>>>>> env.readTheNextThing().do()Something(); >>>>>>>> env.execute(); >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> You can also just "save" a program and keep it for later execution: >>>>>>>> >>>>>>>> Plan plan = env.createProgramPlan(); >>>>>>>> >>>>>>>> at a later point you can start that plan: new >>>>>>>> RemoteExecutor(master, 6123).execute(plan); >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Stephan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Any help on this? :( >>>>>>>>> >>>>>>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi guys, >>>>>>>>>> I forgot to ask you if there's a Flink utility to simulate the >>>>>>>>>> Hadoop ProgramDriver class that acts somehow like a registry of >>>>>>>>>> jobs. Is >>>>>>>>>> there something similar? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Flavio >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>> >>
