Re: Moving from single-node, Maven examples to cluster execution

2016-06-16 Thread Prez Cannady
All right, I figured I’d have to do shading, but hadn’t gotten around to experimenting. I’ll try it out. Prez Cannady p: 617 500 3378 e: revp...@opencorrelate.org GH: https://github.com/opencorrelate LI:

Re: Moving from single-node, Maven examples to cluster execution

2016-06-16 Thread Josh
Hi Prez, You need to build a jar with all your dependencies bundled inside. With maven you can use maven-assembly-plugin for this, or with SBT there's sbt-assembly. Once you've done this, you can login to the JobManager node of your Flink cluster, copy the jar across and use the Flink command

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-16 Thread Ufuk Celebi
Thanks :) On Thu, Jun 16, 2016 at 3:21 PM, Hironori Ogibayashi wrote: > Ufuk, > > Yes, of course. I will be sure to update when I got some more information. > > Hironori > > 2016-06-16 1:56 GMT+09:00 Ufuk Celebi : >> Hey Hironori, >> >> thanks for reporting

Documentation for translation of Job graph to Execution graph

2016-06-16 Thread Bajaj, Abhinav
Hi, When troubleshooting a flink job, it is tricky to map the Job graph (application code) to the logs & monitoring REST APIs. So, I am trying to find documentation on how a Job graph is translated to Execution graph. I found this -

Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-16 Thread Josh
Hey, I've been running the Kinesis connector successfully now for a couple of weeks, on a Flink cluster running Flink 1.0.3 on EMR 2.7.1/YARN. Today I've been trying to get it working on a cluster running the current Flink master (1.1-SNAPSHOT) but am running into a classpath issue when starting

Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

2016-06-16 Thread Till Rohrmann
Hi Arnaud, at the moment the environment variable is the only way to specify a different config directory for the CLIFrontend. But it totally makes sense to introduce a --configDir parameter for the flink shell script. I'll open an issue for this. Cheers, Till On Thu, Jun 16, 2016 at 5:36 PM,

Re: Elasticsearch Connector

2016-06-16 Thread Till Rohrmann
Hi Eamon, in order to use the snapshot binaries you have to add the snapshot repository to your pom.xml: apache.snapshots Apache Development Snapshot Repository https://repository.apache.org/content/repositories/snapshots/ false true Cheers,

Re: How MapFunction gets executed?

2016-06-16 Thread Till Rohrmann
Hi Yan Chou Chen, Flink does not instantiate for each record a mapper. Instead, it will create as many mappers as you've defined with the parallelism. Each of these mappers is deployed to a slot on a TaskManager. When it is deployed and before it receives records, the open method is called once.

RE: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

2016-06-16 Thread LINZ, Arnaud
Okay, is there a way to specify the flink-conf.yaml to use on the ./bin/flink command-line? I see no such option. I guess I have to set FLINK_CONF_DIR before the call ? -Message d'origine- De : Maximilian Michels [mailto:m...@apache.org] Envoyé : mercredi 15 juin 2016 18:06 À :

Elasticsearch Connector

2016-06-16 Thread Eamon Kavanagh
Hey Support, I'm trying to use Flink's Elasticsearch connector but I'm having trouble. When I add the dependency seen here ( https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/elasticsearch2.html) to my pom file, IntelliJ can't find it. I also can't find it on the

Re: dataset dataframe join

2016-06-16 Thread Vishnu Viswanath
Thank you Till, On Thu, Jun 16, 2016 at 10:08 AM, Till Rohrmann wrote: > Hi Vishnu, > > currently the only way to do this, is to persist the DataSet (e.g. writing > to a file) and then reading from the persisted form (e.g. file) in the open > method of a rich function in

How MapFunction gets executed?

2016-06-16 Thread Yan Chou Chen
A quick question. When running a stream job that executes DataStream.map(MapFunction) , after data is read from Kafka, does each MapFunction is created per item or based on parallelism? For instance, for the following code snippet val env = StreamExecutionEnvironment.getExeutionEnvironment val

Re: dataset dataframe join

2016-06-16 Thread Till Rohrmann
Hi Vishnu, currently the only way to do this, is to persist the DataSet (e.g. writing to a file) and then reading from the persisted form (e.g. file) in the open method of a rich function in the DataStream program. That way you can keep the data in your operator and then join with incoming stream

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-16 Thread Hironori Ogibayashi
Ufuk, Yes, of course. I will be sure to update when I got some more information. Hironori 2016-06-16 1:56 GMT+09:00 Ufuk Celebi : > Hey Hironori, > > thanks for reporting this. Could you please update this thread when > you have more information from the Kafka list? > > – Ufuk