Re: [DISCUSS] Enhance Window Evictor in Flink

2016-07-06 Thread Vishnu Viswanath
Thank you Maxim and Aljoscha. Yes the beforeEvict and afterEvict should able address point 3. I have one more use case in my mind (which I might have to do in the later stages of POC). What if the `evictAfter` should behave differently based on the window function. For example. I have a window t

Re: [DISCUSS] FLIP 1 - Flink Improvement Proposal

2016-07-06 Thread Aljoscha Krettek
Jip, that's why I referenced the Kafka process which is also in their wiki. On Wed, 6 Jul 2016 at 21:01 Stephan Ewen wrote: > Yes, big +1 > > I had actually talked about the same thing with some people as well. > > I am currently sketching a few FLIPs for things, like improvements to the > Yarn/

Re: [DISCUSS] Enhance Window Evictor in Flink

2016-07-06 Thread Maxim
Actually for such evictor to be useful the window should be sorted by some field, usually event time. What do you think about adding sorted window abstraction? On Wed, Jul 6, 2016 at 11:36 AM, Aljoscha Krettek wrote: > @Maxim: That's perfect I didn't think about using Iterator.remove() for > tha

Re: [DISCUSS] FLIP 1 - Flink Improvement Proposal

2016-07-06 Thread Stephan Ewen
Yes, big +1 I had actually talked about the same thing with some people as well. I am currently sketching a few FLIPs for things, like improvements to the Yarn/Mesos/Kubernetes integration One thing we should do here is to actually structure the wiki a bit to make it easier to find information

Re: [DISCUSS] Enhance Window Evictor in Flink

2016-07-06 Thread Aljoscha Krettek
@Maxim: That's perfect I didn't think about using Iterator.remove() for that. I'll update the doc. What do you think Vishnu? This should also cover your before/after case nicely. @Vishnu: The steps would be these: - Converge on a design in this discussion - Add a Jira issue here: https://issues.

Re: [DISCUSS] Enhance Window Evictor in Flink

2016-07-06 Thread Maxim
The new API forces iteration through every element of the buffer even if a single value to be evicted. What about implementing Iterator.remove() method for elements? The API would look like: public interface Evictor extends Serializable { /** * Optionally evicts elements. Called before wi

Re: [DISCUSS] Enhance Window Evictor in Flink

2016-07-06 Thread Vishnu Viswanath
Hi Aljoscha, Thanks. Yes the new interface seems to address points 1 and 2. of *1) I am having a use case where I have to create a custom Evictor that will evict elements from the window based on the value (e.g., if I have elements are of case class Item(id: Int, type:String) then evict elements

[DISCUSS] Enhance Window Evictor in Flink

2016-07-06 Thread Aljoscha Krettek
Hi, as mentioned in the thread on improving the Windowing API I also have a design doc just for improving WindowEvictors. I had this in my head for a while but was hesitant to publish but since people are asking about this now might be a good time to post it. Here's the doc: https://docs.google.com

Re: [DISCUSS] Allowed Lateness in Flink

2016-07-06 Thread Aljoscha Krettek
@Vishnu Funny you should ask that because I have a design doc lying around. I'll open a new mail thread to not hijack this one. On Wed, 6 Jul 2016 at 17:17 Vishnu Viswanath wrote: > Hi, > > I was going through the suggested improvements in window, and I have > few questions/suggestion on improve

Re: [DISCUSS] Allowed Lateness in Flink

2016-07-06 Thread Vishnu Viswanath
Hi, I was going through the suggested improvements in window, and I have few questions/suggestion on improvement regarding the Evictor. 1) I am having a use case where I have to create a custom Evictor that will evict elements from the window based on the value (e.g., if I have elements are of ca

Connecting Flink and Hive

2016-07-06 Thread Alan Gates
I’d like to work on creating a Flink Sink for Hive’s streaming ingest[1]. But I recall recently seeing a message on the dev list about moving some of the third party connectors out of Flink as devs were having problems maintaining them. So, is this the sort of thing I should contribute to Fli

Re: [DISCUSS] FLIP 1 - Flink Improvement Proposal

2016-07-06 Thread Ufuk Celebi
Hey Aljoscha, thanks for this proposal. I've somehow missed it last week. I like the idea very much and agree with your assessment about the problems with the Google Doc approach. Regarding the process: I'm also in favour of adopting it from Kafka. I would not expect any problems with this, but w

[jira] [Created] (FLINK-4162) Event-Time CEP Job Fails after Restart

2016-07-06 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-4162: --- Summary: Event-Time CEP Job Fails after Restart Key: FLINK-4162 URL: https://issues.apache.org/jira/browse/FLINK-4162 Project: Flink Issue Type: Bug

Re: [DISCUSS] Allowed Lateness in Flink

2016-07-06 Thread Aljoscha Krettek
I did: https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3ccanmxww0abttjjg9ewdxrugxkjm7jscbenmvrzohpt2qo3pq...@mail.gmail.com%3e ;-) On Wed, 6 Jul 2016 at 15:31 Ufuk Celebi wrote: > On Wed, Jul 6, 2016 at 3:19 PM, Aljoscha Krettek > wrote: > > In the future, it might be good to

Re: [DISCUSS] Allowed Lateness in Flink

2016-07-06 Thread Ufuk Celebi
On Wed, Jul 6, 2016 at 3:19 PM, Aljoscha Krettek wrote: > In the future, it might be good to to discussions directly on the ML and > then change the document accordingly. This way everyone can follow the > discussion on the ML. I also feel that Google Doc comments often don't give > enough space f

Re: [DISCUSS] Allowed Lateness in Flink

2016-07-06 Thread Aljoscha Krettek
Hi, I cleaned up the document a bit and added sections to address comments on the doc: https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing (I also marked proposed features that are already implemented as [done].) The main thing that remains to be figure

[jira] [Created] (FLINK-4161) Quickstarts can exclude more flink-dist dependencies

2016-07-06 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-4161: --- Summary: Quickstarts can exclude more flink-dist dependencies Key: FLINK-4161 URL: https://issues.apache.org/jira/browse/FLINK-4161 Project: Flink Issu

[jira] [Created] (FLINK-4160) YARN session doesn't show input validation errors

2016-07-06 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-4160: - Summary: YARN session doesn't show input validation errors Key: FLINK-4160 URL: https://issues.apache.org/jira/browse/FLINK-4160 Project: Flink Issue Type:

[jira] [Created] (FLINK-4159) Quickstart poms exclude unused dependencies

2016-07-06 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-4159: --- Summary: Quickstart poms exclude unused dependencies Key: FLINK-4159 URL: https://issues.apache.org/jira/browse/FLINK-4159 Project: Flink Issue Type: B

Re: Issues while interacting with DynamoDB

2016-07-06 Thread Aljoscha Krettek
Hi, are you running this on Yarn. If yes, the EMR Yarn installation might already have some of the AWS jars in the classpath and that might interact badly with the Jars that you manually put into the flink/lib folder. Cheers, Aljoscha P.S. In the future, please use the user mailing list for reque

[jira] [Created] (FLINK-4158) Scala QuickStart StreamingJob fails to compile

2016-07-06 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-4158: --- Summary: Scala QuickStart StreamingJob fails to compile Key: FLINK-4158 URL: https://issues.apache.org/jira/browse/FLINK-4158 Project: Flink Issue Type

[jira] [Created] (FLINK-4157) FlinkKafkaMetrics cause TaskManager shutdown during cancellation

2016-07-06 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-4157: - Summary: FlinkKafkaMetrics cause TaskManager shutdown during cancellation Key: FLINK-4157 URL: https://issues.apache.org/jira/browse/FLINK-4157 Project: Flink

Re: [Discussion] Query Regarding Operator chaining

2016-07-06 Thread Aljoscha Krettek
Hi, unfortunately the reading of one Kafka partition cannot be split among several parallel instances of the Kafka source. So if you have only 2 partitions your reading parallelism is limited to that. You are right that this can lead to bad performance and underutilization. The only solution I see

[jira] [Created] (FLINK-4156) Job with -m yarn-cluster registers TaskManagers to another running Yarn session

2016-07-06 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-4156: - Summary: Job with -m yarn-cluster registers TaskManagers to another running Yarn session Key: FLINK-4156 URL: https://issues.apache.org/jira/browse/FLINK-4156 Proje