Re: [DISCUSS] FLIP-18: Code Generation for improving sorting performance

2017-03-23 Thread Gábor Gévay
Hello, > Second, additional classes will turn performance critical callsites > megamorphic. Yes, this is a completely valid point, thanks for raising this issue Greg. We were planning to have an offline discussion tomorrow with Pattarawat about this. We have a few options: 1. We could fuse the

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-14 Thread Gábor Gévay
Hello, Pat, the table in your email is somehow not visible in my gmail, but it is visible here: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/FLINK-5734-Code-Generation-for-NormalizedKeySorter-tt15804.html#a15936 Maybe the problem is caused by the formatting. > FLINK-3722 >

Backpressure rationale

2017-01-12 Thread Gábor Gévay
Hello, I would like to ask about the rationale behind the backpressure mechanism in Flink. As I understand it, backpressure is for handling the problem of one operator (or source) producing records faster then the next operator can consume them. However, an alternative solution would be to have

Re: [DISCUSS] Add Side Input/Broadcast Set For Streaming API

2016-12-20 Thread Gábor Gévay
Hello, I am also interested in this feature for a paper that I'm writing. I have the "slowly evolving side input" case with a complicated custom "update precondition" that would be expressible by a stateful UDF that makes its decisions from looking at the elements of the main stream. Best,

Re: [DISCUSS] Make FieldAccessor logic consistent with remaining API

2016-10-27 Thread Gábor Gévay
Hello, Thanks Fabian for starting this discussion. I would just like to add a few thougths about why does the FieldAccessors even exist. One could say that we should instead just re-use ExpressionKeys for the aggregations, and we are done. As far as I can see, the way ExpressionKeys is often

Re: Contribution

2016-09-09 Thread Gábor Gévay
Hi Hasan, Welcome! There is the "starter" label on some Jiras, which means that the issue is good for getting started. Best, Gábor 2016-09-09 13:46 GMT+02:00 Hasan Gürcan : > Hi devs, > > i contributed to Stratosphere as I was studying computer science

Nested field expressions PR

2016-08-14 Thread Gábor Gévay
Hello, I have a PR for making FieldAccessors support nested field expressions (like .sum("f1.foo.bar")) that has been open for quite a long time: https://github.com/apache/flink/pull/2094 Although it looks like that it is a lot of code, most of it is actually fairly straightforward, so it

Re: N-ary stream operators - status

2016-08-10 Thread Gábor Gévay
ut-wrapper > > Cheers, > Aljoscha > > On Tue, 9 Aug 2016 at 17:54 Gábor Gévay <gga...@gmail.com> wrote: > >> Hello, >> >> There is this Google Doc about adding n-ary stream operators to Flink: >> >> https://docs.google.com/document/d/1ZFzL_0

Re: Iteration Intermediate Output

2016-05-30 Thread Gábor Gévay
Hello, > Would the best way be to extend the iteration operators to support > intermediate outputs or revisit the idea of caching intermediate results > and thus allow efficient for-loop iterations? Caching intermediate results would also help a lot to projects that are targeting Flink as a

Re: Dataset split/demultiplex

2016-05-13 Thread Gábor Gévay
gt; > > if (predicate(i)) >> > > > >> > > > trueEvents.collect(i) >> > > > >> > > > else >> > > > >> > > > falseEvents.collect(i) >> > > > >> >

Re: Dataset split/demultiplex

2016-05-12 Thread Gábor Gévay
Hello, You can split a DataSet into two DataSets with two filters: val xs: DataSet[A] = ... val split1: DataSet[A] = xs.filter(f1) val split2: DataSet[A] = xs.filter(f2) where f1 and f2 are true for those elements that should go into the first and second DataSets respectively. So far, the

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Gábor Gévay
Hello, There are at least three Gábors in the Flink community, :) so assuming that the Gábor in the list of maintainers of the DataSet API is referring to me, I'll be happy to do it. :) Best, Gábor G. 2016-05-10 11:24 GMT+02:00 Stephan Ewen : > Hi everyone! > > We propose

Re: [DISCUSS] Macro-benchmarking for performance tuning and regression detection

2016-04-09 Thread Gábor Gévay
Hello, I think that creating a macro-benchmarking module would be a very good idea. It would make doing performance-related changes much easier and safer. I have also used Peel, and can confirm that it would be a good fit for this task. > I've also been looking recently at some of the hot code

Re: Guarantees for object reuse modes and documentation

2016-02-20 Thread Gábor Gévay
Thanks, Ken! I was wondering how other systems handle these issues. Fortunately, the deep copy - shallow copy problem doesn't arise in Flink: when we copy an object, it is always a deep copy (at least, I hope so :)). Best, Gábor 2016-02-19 22:29 GMT+01:00 Ken Krugler

Re: Memory manager behavior in iterative jobs

2016-02-02 Thread Gábor Gévay
parameter to true is the original behavior and does not have > any downside effects for batch programs. > > The effect of the switch on the performance of iterative jobs is > interesting and it sounds like it should be improved. > > Best, Fabian > > 2016-01-30 14:04 GMT+01:00 Gá

Memory manager behavior in iterative jobs

2016-01-30 Thread Gábor Gévay
Hello! We have a strangely behaving iterative Flink job: when we give it more memory, it gets much slower (more than 10 times). The problem seems to be mostly caused by GCs. Enabling object reuse didn’t help. With some profiling and debugging, we traced the problem to the operators requesting

Re: Object reuse documentation should be improved

2015-12-14 Thread Gábor Gévay
gt; these and I think it can be tricky to know whether stuff is chained or not > (for users, and even for us developers…). > > >> On 13 Dec 2015, at 19:24, Gábor Gévay <gga...@gmail.com> wrote: >> >> Hello, >> >> I find the documentation about object r

Object reuse documentation should be improved

2015-12-13 Thread Gábor Gévay
Hello, I find the documentation about object reuse [1] very confusing. I started a Google Doc [2] about clarifying/rewriting it. First, it states four questions that I think should have answers stated explicitly in the documentation, and then lists some concrete problems (ambiguities) in the

Hash-based aggregation

2015-10-01 Thread Gábor Gévay
Hello, I would really like to see FLINK-2237 solved. I would implement this feature over the weekend, if the CompactingHashTable can be used to solve it (see my comment there). Could you please give me some advice on whether is this a viable approach, or you perhaps see some difficulties that I'm

Re: Streaming KV store abstraction

2015-09-08 Thread Gábor Gévay
Hello, As for use cases, in my old job at Ericsson we were building a streaming system that was processing data from telephone networks, and it was using key-value stores a LOT. For example, keeping track of various state info of the users (which cell are they currently connected to, what bearers

Flink Forward webpage down

2015-08-14 Thread Gábor Gévay
Hello, I would like to submit an abstract to Flink Forward, but the webpage of the conference (flink-forward.org) seems to be down. It prints Error establishing a database connection for me. It worked yesterday. Best regards, Gabor

Re: A soft reminder

2015-07-30 Thread Gábor Gévay
Hi, I have also run into this problem just now. It only happens with much data. Best regards, Gabor 2015-07-27 11:35 GMT+02:00 Felix Neutatz neut...@googlemail.com: Hi, I also encountered the EOF exception for a delta iteration with more data. With less data it works ... Best regards,

Re: A soft reminder

2015-07-30 Thread Gábor Gévay
Yes, in a VertexCentricIteration with a few million nodes, running locally on my laptop with about 10 GB of memory given to java. Best, Gabor 2015-07-30 18:32 GMT+02:00 Andra Lungu lungu.an...@gmail.com: Hi Gabor, Within a delta iteration right? On Thu, Jul 30, 2015 at 6:31 PM, Gábor

Re: A soft reminder

2015-07-30 Thread Gábor Gévay
and this parameter deactivates its usage. I am very curious what happens in your case. If you could tell us the outcome, I'd greately appreciate it. On Thu, Jul 30, 2015 at 7:17 PM, Gábor Gévay gga...@gmail.com wrote: Yes, in a VertexCentricIteration with a few million nodes, running

Failing tests on Windows

2015-07-17 Thread Gábor Gévay
Hello! I tried to setup a development environment on Windows, but several tests are failing: 1. The setWritable problem. This will be worked around by [1] 2. The tryCleanupOnError before close problem [2]. This could be half-fixed by doing fixing 2. in the comment I wrote there, but I think

Re: Failing tests on Windows

2015-07-17 Thread Gábor Gévay
BlobUtilsTest.before:45 null BlobUtilsTest.before:45 null BlobServerDeleteTest.testDeleteFails:291 null BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not remove write permissions from cache directory BlobServerPutTest.testPutBufferFails:224 null

Re: Thoughts About Streaming

2015-06-25 Thread Gábor Gévay
at some point the pre-reducer seems to go haywire and does not recover from it. The good thing is that it does produce results now, where the previous Current/Reduce would simply hang and not produce any output. On Thu, 25 Jun 2015 at 12:02 Gábor Gévay gga...@gmail.com wrote: Hello, Aljoscha

Re: Thoughts About Streaming

2015-06-25 Thread Gábor Gévay
Hello, Aljoscha, can you please try the performance test of Current/Reduce with the InversePreReducer in PR 856? (If you just call sum, it will use an InversePreReducer.) It would be an interesting test, because the inverse function optimization really depends on the stream being ordered, and I

Re: Question: SourceFunction

2015-06-22 Thread Gábor Gévay
Hi, There is one more tricky issue here if the variable is not volatile, which can cause a problem on any architecture: If the compiler determines that the code inside the loop will never modify isRunning, then it might optimize the exit condition into just while(true). And this can actually

Adding flink-scala as a dependency to flink-streaming-core

2015-06-10 Thread Gábor Gévay
Hello, I would like to ask if it would be OK if I added flink-scala as a dependency to flink-streaming-core. An alternative would be to move the Scala typeutils to flink-core (to where the Java typeutils are). Why I need this: While I am implementing the fast median calculation for windows as

Re: Adding flink-scala as a dependency to flink-streaming-core

2015-06-10 Thread Gábor Gévay
at 2:13 PM Gábor Gévay gga...@gmail.com wrote: Hello, I would like to ask if it would be OK if I added flink-scala as a dependency to flink-streaming-core. An alternative would be to move the Scala typeutils to flink-core (to where the Java typeutils are). Why I need this: While I am

Re: Adding flink-scala as a dependency to flink-streaming-core

2015-06-10 Thread Gábor Gévay
can estimate how likely this is? Best regards, Gabor 2015-06-10 15:55 GMT+02:00 Gábor Gévay gga...@gmail.com: it does not feel right to add an API package to a core package Yes, that makes sense. I just tried removing the flink-java dependency from flink-streaming to see what needs

Re: Memleak in the SessionWindowing example

2015-06-08 Thread Gábor Gévay
at 10:01 PM, Gábor Gévay gga...@gmail.com wrote: Let's not get all dramatic :D Ok, sorry :D If we don't call any methods on the empty groups we can still keep them off-memory in a persistent storage with a lazy checkpoint/state-access logic with practically 0 memory overhead. So you

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Gábor Gévay
? On Thu, May 28, 2015 at 3:01 PM, Gábor Gévay gga...@gmail.com wrote: Hi, At Ericsson, we are implementing something similar to what the SessionWindowing example does: There are events belonging to phone calls (sessions), and every event has a call_id, which tells us which call

Re: GSoC proposal

2015-03-26 Thread Gábor Gévay
://gist.github.com/debasishg/8172796 Cheers, Gyula On Thu, Mar 26, 2015 at 7:50 PM, Gábor Gévay gga...@gmail.com wrote: Hello, I will be applying to the Google Summer of Code, and I wrote most of the proposal: http://compalg.inf.elte.hu/~ggevay/Proposal.pdf I would appreciate it if you could