Re: CEP issue

2018-02-01 Thread Vishal Santoshi
I have flink master CEP library code imported to a 1.4 build. On Thu, Feb 1, 2018 at 5:33 PM, Vishal Santoshi wrote: > A new one > > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3332) > at >

Re: CEP issue

2018-02-01 Thread Vishal Santoshi
A new one java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)

Re: CEP issue

2018-02-01 Thread Vishal Santoshi
It happens when it looks to throw an exception and calls shardBuffer.toString. b'coz of the check int id = sharedBuffer.entryId; Preconditions.checkState(id != -1, "Could not find id for entry: " + *sharedBuffer*); On Thu, Feb 1, 2018 at 5:09 PM, Vishal Santoshi

Re: CEP issue

2018-02-01 Thread Vishal Santoshi
The watermark has not moved for this pattern to succeed ( or other wise ), the issue though is that it is pretty early in the pipe ( like within a minute ). I am replaying from a kafka topic but the keyed operator has emitted no more than 1500 plus elements to SelectCEPOperator ( very visible on

CEP issue

2018-02-01 Thread Vishal Santoshi
This is a pretty simple pattern, as in I hardly have 1500 elements ( across 600 keys at the max ) put in and though I have a pretty wide range , as in I am looking at a relaxed pattern ( like 40 true conditions in 6 hours ), I get this. I have the EventTime turned on.

Flink REST API

2018-02-01 Thread Raja . Aravapalli
Hi, I have a triggered a Flink YARN Session on Hadoop yarn. While I was able to trigger applications and run them. I wish to find the URL of REST API for the Flink YARN Sesssion I launched. Can someone please help me point out on how to find the REST API Url for the Flink on YARN? Thanks a

Flink on AWS EMR - how to use flink-log4j configuration?

2018-02-01 Thread Ishwara Varnasi
I didn't find an example of flink-log4j configuration while creating EMR cluster for running Flink. What should be passed to "flink-log4j" config? Actual log4j config or path to file? Also, how to see application logs in EMR? thanks Ishwara Varnasi

Re: Flink on K8s job submission best practices

2018-02-01 Thread Christophe Jolif
Hi Maximilian, Coming back on this as we have similar challenges. I was leaning towards 3. But then I read you and figured I might have missed something ;) We agree 3 is not idiomatic and creates a "detached job" but in a lack of a proper solution I can live with that. We also agree there is

Get the JobID when Flink job fails

2018-02-01 Thread Vinay Patil
Hi, When the Flink job executes successfully I get the jobID, however when the Flink job fails the jobID is not returned. How do I get the jobId in this case ? Do I need to call /joboverview REST api to get the job ID by looking for the Job Name ? Regards, Vinay Patil

RE: S3 for state backend in Flink 1.4.0

2018-02-01 Thread Edward Rojas
Hi Hayden, It seems like a good alternative. But I see it's intended to work with spark, did you manage to get it working with Flink ? I some tests but I get several errors when trying to create a file, either for checkpointing or saving data. Thanks in advance, Regards, Edward -- Sent

How to handle multiple yarn sessions and choose at runtime the one to submit a ha streaming job ?

2018-02-01 Thread LINZ, Arnaud
Hello, I am using Flink 1.3.2 and I'm struggling to achieve something that should be simple. For isolation reasons, I want to start multiple long living yarn session containers (with the same user) and choose at run-time, when I start a HA streaming app, which container will hold it. I start

Latest version of Kafka

2018-02-01 Thread Marchant, Hayden
What is the newest version of Kafka that is compatible with Flink 1.4.0? I see the last version of Kafka supported is 0.11 , from documentation, but has any testing been done with Kafka 1.0? Hayden Marchant

RE: Multiple Elasticsearch sinks not working in Flink

2018-02-01 Thread Teena Kappen // BPRISE
@Fabian: I will run the code with the Git repo source and let you know the results. @Stephan: Sorry I missed the email from you somehow. I understand from the JIRA link that you already have the answer for this. Yet I tried using two separate config map objects in my code and that resolved the

Re: How does BucketingSink generate a SUCCESS file when a directory is finished

2018-02-01 Thread Kien Truong
Hi, I did not actually test this, but I think with Flink 1.4 you can extend BucketingSink and overwrite the invoke method to access the watermark Pseudo code: invoke(IN value, SinkFunction.Context context) {    long currentWatermark = context.watermark()    long taskIndex =

Reading bounded data from Kafka in Flink job

2018-02-01 Thread Marchant, Hayden
I have 2 datasets that I need to join together in a Flink batch job. One of the datasets needs to be created dynamically by completely 'draining' a Kafka topic in an offset range (start and end), and create a file containing all messages in that range. I know that in Flink streaming I can

Re: Maintain heavy hitters in Flink application

2018-02-01 Thread Timo Walther
Hi, I think it would be easier to implement a custom key selector and introduce some artifical key that spreads the load more evenly. This would also allow you to use keyed state. You could use a ProcessFunction and set timers to define the "every now and then". Keyed state would also ease

Re: Maintain heavy hitters in Flink application

2018-02-01 Thread m@xi
Anyone, someone, somebody? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

RE: S3 for state backend in Flink 1.4.0

2018-02-01 Thread Marchant, Hayden
Edward, We are using Object Storage for checkpointing. I'd like to point out that we were seeing performance problems using the S3 protocol. Btw, we had quite a few problems using the flink-s3-fs-hadoop jar with Object Storage and had to do some ugly hacking to get it working all over. We

queryable state API

2018-02-01 Thread Maciek Próchniak
Hello, Currently (1.4) to be able to use queryable state client has to know ip of (working) task manager and port. This is a bit awkward - as it forces external services to know details of flink cluster. Event more complex when we define port range for queryable state proxy and we're not sure