Re: Broadcast Config through Connected Stream

2017-09-25 Thread Navneeth Krishnan
Thanks a lot Aljoscha. That helps. On Mon, Sep 25, 2017 at 4:47 AM, Aljoscha Krettek wrote: > Hi, > > I think this is a valid approach, you can even use "operator state" in > your map function to make the broadcast config state stateful. > > Another approach would be to use

Flink on EMR

2017-09-25 Thread Navneeth Krishnan
Hello All, I'm trying to deploy flink on AWS EMR and I'm very new to EMR. I'm running into multiple issues and need some help. *Issue1:* How did others resolve this multiple bindings issue? SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in

Questions about checkpoints/savepoints

2017-09-25 Thread vipul singh
Hello, I have some confusion about checkpoints vs savepoints, and how to use them effectively in my application. I am working on an application which is relies on flink's fault tolerant mechanism to ensure exactly once semantics. I have enabled external checkpointing in my application as below:

java.lang.OutOfMemoryError: Java heap space at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)

2017-09-25 Thread sohimankotia
Hi, I am getting Java Heap Space error while running Flink Job (Flink 1.2 ) . Use case : I am getting all keys from REDIS with specific pattern . Then streaming over those keys and reading data from Redis for those key and writing to file in HDFS . Job was running fine for few days but

FIP-6: Job specific Docker images status?

2017-09-25 Thread Elias Levy
I was wondering what is the status of support for job specific Docker images, meaning images that combine the job jars with the job manager, do not require job submission, and automatically execute the job when there are enough task managers registered with the job manager to satisfy the job's

CheckpointConfig says to persist periodic checkpoints, but no checkpoint directory has been configured.

2017-09-25 Thread Hao Sun
Hi I am running flink in dev mode through Intellij, I have flink-conf.yaml correctly configured and from the log you can see job manager is reading it. 2017-09-25 20:41:52.255 [main] INFO org.apache.flink.configuration.GlobalConfiguration - *Loading configuration property: state.backend,

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

2017-09-25 Thread Elias Levy
Why a range instead of just a single port in HA mode? On Mon, Sep 25, 2017 at 1:49 PM, Till Rohrmann wrote: > Yes, with Flip-6 it will most likely look like how Stephan described it. > We need the explicit port in standalone mode so that TMs can connect to the > JM. In the

How to clear registered timers for a merged window?

2017-09-25 Thread Yan Zhou [FDS Science] ­
Hi, I am implementing a merge-able trigger, and having a problem in clearing the registered timers for a merged window (a window has been merged into the merging result). For my implementation, the trigger registers multiple timers for each element at Trigger#onElement(). State is used to keep

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

2017-09-25 Thread Till Rohrmann
Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration

Flink Job on Docker on Mesos cluster

2017-09-25 Thread Rahul Raj
Hi All, I am working on a project which involves running flink jobs on docker containers on Mesos cluster. But I am failing to understand how docker& flink will work on Mesos cluster. Can anyone explain me how Flink cluster will run on docker containers on Mesos? If I create a docker image

Re: Can i use Avro serializer as default instead of kryo serializer in Flink for scala API ?

2017-09-25 Thread Nico Kruber
Hi Shashank, enabling Avro as the default de/serializer for Flink should be as simple as the following, according to [1] val env = StreamExecutionEnvironment.getExecutionEnvironment env.getConfig.enableForceAvro() I am, however, no expert on this and the implications regarding the use of Avro

Re: Using HiveBolt from storm-hive with Flink-Storm compatibility wrapper

2017-09-25 Thread Timo Walther
Hi Federico, I think going through a Storm compatibility layer could work, but did you thought about using the flink-jdbc connector? That should be the easiest solution. Otherwise I think it would be easier to quickly implement your our SinkFunction. It is just one method that you have to

Re: Building scala examples

2017-09-25 Thread Nico Kruber
Hi Michael, from what I see, Java and Scala examples reside in different packages, e.g. * org.apache.flink.streaming.scala.examples.async.AsyncIOExample vs. * org.apache.flink.streaming.examples.async.AsyncIOExample A quick run on the Flink 1.3. branch revealed flink-examples-

Re: Rule expression for CEP library

2017-09-25 Thread Gábor Gévay
Hello Shailesh, There is a Flink Improvement Proposal for Integration of SQL and CEP: https://cwiki.apache.org/confluence/display/FLINK/FLIP-20:+Integration+of+SQL+and+CEP Best, Gábor On Mon, Sep 25, 2017 at 3:21 PM, Shailesh Jain wrote: > Hi, > > Apart from the

RE: Flink kafka consumer that read from two partitions in local mode

2017-09-25 Thread Sofer, Tovi
Hi Gordon, Thanks for your assistance. · We are running flink currently in local mode(MiniCluster), using flink 1.3.2 and flink-connector-kafka-0.10_2.10. · In Consumer log I see 1 partition only (when parallelism=1), so the problem indeed seems to be in producer.

Re: Using HiveBolt from storm-hive with Flink-Storm compatibility wrapper

2017-09-25 Thread Nico Kruber
Hi Federico, I also did not find any implementation of a hive sink, nor much details on this topic in general. Let me forward this to Timo and Fabian (cc'd) who may know more. Nico On Friday, 22 September 2017 12:14:32 CEST Federico D'Ambrosio wrote: > Hello everyone, > > I'd like to use the

Re: History Server

2017-09-25 Thread Nico Kruber
Hi Elias, in theory, it could be integrated into a single web interface, but this was not done so far. I guess the main reason for keeping it separate was probably to have a better separation of concerns as the history server is actually independent of the current JobManager execution and

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

2017-09-25 Thread Stephan Ewen
/cc Till for real this time ;-) Hi! I think that can probably be simplified in the FLIP-6 case: - All RPC is only between JM and TM and the port should be completely random (optionally within a range). TM and JM discover each other via HA (ZK) or the TM gets the JM RPC port as a parameter

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

2017-09-25 Thread Stephan Ewen
Hi! I think that can probably be simplified in the FLIP-6 case: - All RPC is only between JM and TM and the port should be completely random (optionally within a range). TM and JM discover each other via HA (ZK) or the TM gets the JM RPC port as a parameter when the container is started.

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

2017-09-25 Thread Nico Kruber
Hi Elias, indeed that looks strange but was introduced with FLINK-3172 [1] with an argument about using the same configuration key (as opposed to having two different keys as mentioned) starting at https://issues.apache.org/jira/browse/FLINK-3172? focusedCommentId=15091940#comment-15091940

Rule expression for CEP library

2017-09-25 Thread Shailesh Jain
Hi, Apart from the Java/Scala API for the CEP library, is there any other way to express patterns/rules which can be run on flink engine? Are there any plans on adding a DSL/Rule expression language for CEP anytime soon? If not, any pointers on how it can be achieved now would be really helpful.

RE: Flink kafka consumer that read from two partitions in local mode

2017-09-25 Thread Tzu-Li (Gordon) Tai
Hi Tovi, Your code seems to be correct, and as Fabian described, you don’t need parallelism of 2 to read 2 partitions; a single parallel instance of the source can read multiple partitions. I’m not sure what could have possibly gone wrong at the moment from a first look, so I may need to

Re: Broadcast Config through Connected Stream

2017-09-25 Thread Aljoscha Krettek
Hi, I think this is a valid approach, you can even use "operator state" in your map function to make the broadcast config state stateful. Another approach would be to use internal APIs to hack an operator that has a keyed stream on one input and a broadcast stream on the second input. You can

Re: Unable to run flink job after adding a new dependency (cannot find Main-class)

2017-09-25 Thread Federico D'Ambrosio
As a little update, the pattern for the exclusion of those files in sbt-assembly is the following: assemblyMergeStrategy in assembly := { case PathList(ps @ _*) if ps.last.endsWith(".DSA") || ps.last.endsWith(".SF") || ps.last.endsWith(".RSA") => MergeStrategy.discard //Other MergeStrategies

Re: Unable to run flink job after adding a new dependency (cannot find Main-class)

2017-09-25 Thread Federico D'Ambrosio
Hi Urs, Thank you very much for your advice, I will look into excluding those files directly during the assembly. 2017-09-25 10:58 GMT+02:00 Urs Schoenenberger < urs.schoenenber...@tngtech.com>: > Hi Federico, > > oh, I remember running into this problem some time ago. If I recall > correctly,

Re: StreamCorruptedException

2017-09-25 Thread Tzu-Li (Gordon) Tai
I talked a bit with Kostas on what may be happening here. It could be that your patterns are not closing, which depends on the pattern construction of your CEP job. Could you perhaps provide an overview / code snippet of what your CEP job is doing? Looping Kostas (in CC) also to this thread as

Re: Unable to run flink job after adding a new dependency (cannot find Main-class)

2017-09-25 Thread Urs Schoenenberger
Hi Federico, oh, I remember running into this problem some time ago. If I recall correctly, this is not a flink issue, but an issue with technically incorrect jars from dependencies which prevent the verification of the manifest. I was using the maven-shade plugin back then and configured an

Re: Exception : Table of atomic type can only have a single field, when transferring DataStream to Table ?

2017-09-25 Thread Timo Walther
Hi, I also replied to your Stackoverflow question. I think the problem is that BillCount has the wrong type and is therefore treated as one single black box. Haohui's suggestion will no work because the row type needs information about the fields.  The easiest thing is to figure out why

Re: Unable to run flink job after adding a new dependency (cannot find Main-class)

2017-09-25 Thread Federico D'Ambrosio
Hi Urs, Yes the main class is set, just like you said. Still, I might have managed to get it working: during the assembly some .SF, .DSA and .RSA files are put inside the META-INF folder of the jar, possibly coming from some of the new dependencies in the deps tree. Apparently, this caused this

Re: akka timeout

2017-09-25 Thread Till Rohrmann
Quick question Steven. Where did you find the documentation concerning that the death watch interval is linke to the akka ask timeout? It was included in the past, but I couldn't find it anymore. Cheers, Till On Mon, Sep 25, 2017 at 9:47 AM, Till Rohrmann wrote: > Great

Re: Unable to run flink job after adding a new dependency (cannot find Main-class)

2017-09-25 Thread Urs Schoenenberger
Hi Federico, just guessing, but are you explicitly setting the Main-Class manifest attribute for the jar that you are building? Should be something like mainClass in (Compile, packageBin) := Some("org.yourorg.YourFlinkJobMainClass") Best, Urs On 23.09.2017 17:53, Federico D'Ambrosio wrote: >

Re: akka timeout

2017-09-25 Thread Till Rohrmann
Great to hear that you could figure things out Steven. You are right. The death watch is no longer linked to the akka ask timeout, because of FLINK-6495. Thanks for the feedback. I will correct the documentation. Cheers, Till On Sat, Sep 23, 2017 at 10:24 AM, Steven Wu

Re: Exception : Table of atomic type can only have a single field, when transferring DataStream to Table ?

2017-09-25 Thread Haohui Mai
Hi, I think instead of generating DataStream[BillCount], the correct way is to generate DataStream[Row], that is, kafkaInputStream.map(value -> Row.of(value.getLogis_id, value.getProvince_id, value.getCity_id, value.getOrder_require_varieties, value.getOrder_rec_amount,