Reflection overhead in KafkaIO, profiling snapshot attached

2021-02-13 Thread Teodor Spæren
. I'm not so familiar with the KafkaIO connector, but I was hoping someone here might know how to diagnoise this further :) / Teodor Spæren Title: Hot Spots Hot Spots Session: Remote attach Time of export: Saturday, February 13, 2021 3:51:32 PM CET JVM time: 06:24    Thread selection

Nexmark coderStrategy AVRO or JAVA causes craches if numEvents is above 991683

2021-01-15 Thread Teodor Spæren
teArrayCoder.java:41) at org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:118) ... 18 more Does anyone have any ideas as to what this might be or how I could go about debugging this? Any help is much appriciated. Regards Teodor Spæren

Nexmark ratelimiting not working

2021-01-07 Thread Teodor Spæren
Hello and happy new years! I've been trying to use Nexmark for evaluating some performance improvements I've made to Beam. I've discovered a problem where Nexmark won't work if you enable the ratelimiting mode. I've filled out a bug report for this[1]. To reproduce the problem is easy,

How to configure logging in nexmark test

2021-01-01 Thread Teodor Spæren
Hey! My question is about how to turn on DEBUG and TRACE logging when running the nexmark test suite via: ./gradlew :sdks:java:testing:nexmark:run I've found the file sdks/java/testing/nexmark/src/main/resources/log4j.properties, but modifiying it seems to have no affect on the output.

Re: Help measuring upcoming performance increase in flink runner on production systems

2020-12-21 Thread Teodor Spæren
add your variant as a new benchmark and compare the difference across many runs in a controlled benchmarking environment. Would that help? Ahmet [1] http://metrics.beam.apache.org/d/1/getting-started?orgId=1 On Tue, Dec 15, 2020 at 5:48 AM Teodor Spæren wrote: Hey! Yeah, that paper was what

Re: Help measuring upcoming performance increase in flink runner on production systems

2020-12-15 Thread Teodor Spæren
or methodologies to consider as you explore this a bit more: https://arxiv.org/pdf/1907.08302.pdf I’m looking forward to reading about your finding, especially using a more recent iteration of Beam! Rion On Dec 14, 2020, at 6:37 AM, Teodor Spæren wrote: Just bumping this so people see it now that 2.26.0

Re: Help measuring upcoming performance increase in flink runner on production systems

2020-12-14 Thread Teodor Spæren
Just bumping this so people see it now that 2.26.0 is out :) On Wed, Nov 25, 2020 at 11:09:52AM +0100, Teodor Spæren wrote: Hey! My name is Teodor Spæren and I'm writing a master thesis investigating the performance overhead of using Beam instead of using the underlying systems directly. My

Help measuring upcoming performance increase in flink runner on production systems

2020-11-25 Thread Teodor Spæren
Hey! My name is Teodor Spæren and I'm writing a master thesis investigating the performance overhead of using Beam instead of using the underlying systems directly. My focus has been on Flink and I've made a discovery about some unnecessary copying between operators in the Flink runner[1][2

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-11-02 Thread Teodor Spæren
- Teodor Spæren On Tue, Nov 03, 2020 at 07:31:47AM +0100, Teodor Spæren wrote: Hey Jan! I have since created a PR and Jira issue for this and I've now run the Nexmark suite with the change applied and not applied. This is just a quick result, but it is very promising! Here is without

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-11-02 Thread Teodor Spæren
l continue to look at this, but these are some strong results I think! - Teodor On Tue, Oct 27, 2020 at 01:53:11PM +0100, Jan Lukavský wrote: Hi, I tend to be +1 for the flag, but before that, we might want to have a deeper analysis of the performance impact. I believe the penalty will be (in percent

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-29 Thread Teodor Spæren
Thanks Jan, this cleared some things up! Best regards, Teodor Spæren On Thu, Oct 29, 2020 at 02:13:50PM +0100, Jan Lukavský wrote: Hi Teodor, the confusion here maybe comes from the fact, that there are two (logical) representations of an element in PCollection. One representation

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-29 Thread Teodor Spæren
Hey Jan! I fully agree! Best regards, Teodor Spæren On Thu, Oct 29, 2020 at 09:00:33AM +0100, Jan Lukavský wrote: Hi Teodor and Max, I think that there is not 100% need for all runners to behave exactly the same way. The reason for that is that different runners can have different purposes

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-29 Thread Teodor Spæren
and all tests passed, so there is no such test in there. Depending on the reading above, we should add such tests to all runners. Best regards, Teodor Spæren On Thu, Oct 29, 2020 at 10:16:30AM +0100, Maximilian Michels wrote: Ok then we are on the same page, but I disagree with your conclusion

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-29 Thread Teodor Spæren
way to go, to not break existing pipelines. Best regards, Teodor Spæren On Wed, Oct 28, 2020 at 07:29:06PM +0100, Maximilian Michels wrote: You are right that Flink serializers do not care to copy for immutable Java types, e.g. Long, Integer, String. However, Pojos or other custom types can

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-28 Thread Teodor Spæren
which use CoderTypeSerializer don't pass down the pipeline options. I can go through each and try to pass it down, but are there any easier way? Some global variable or an earlier point where we could do this? Or simply just remove the constructor without pipeline options? Best Regards, Teodor Spæren On T

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-28 Thread Teodor Spæren
quot;fasterCopy" or "disableFailsafeCopying". Best regards, Teodor Spæren [1]: https://beam.apache.org/documentation/programming-guide/#pcollection-characteristics [2]: https://ci.apache.org/projects/flink/flink-docs-stable/dev/execution_configuration.html [3]: https://beam.apache.o

Contributor permissions for Beam Jira tickets

2020-10-27 Thread Teodor Spæren
the reporter of [1]. Hope to hear back. Best regards, Teodor Spæren [1] https://issues.apache.org/jira/browse/BEAM-11146

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread Teodor Spæren
;) ) Best regards, Teodor Spæren On Tue, Oct 27, 2020 at 01:53:11PM +0100, Jan Lukavský wrote: Hi, I tend to be +1 for the flag, but before that, we might want to have a deeper analysis of the performance impact. I believe the penalty will be (in percentage) much lower in cases of more practical

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread Teodor Spæren
implementing a semantic guarantee that the Beam model explicitly doesn't support. Best regards, Teodor Spæren [1]: https://beam.apache.org/documentation/runners/direct/ On Tue, Oct 27, 2020 at 12:08:51PM +0100, Teodor Spæren wrote: Hey David, I think I might have worded this poorly, because

Re: Possible 80% reduction in overhead for flink runner, input needed

2020-10-27 Thread Teodor Spæren
that we could simply gate this behind an option on the Flink runner. I also tried to search for this before, but did not find any mention of it, can you link me to some discussions about this in the past? Thanks for reply :D Best regards, Teodor Spæren [1]: https://beam.apache.org/documentation

Possible 80% reduction in overhead for flink runner, input needed

2020-10-26 Thread Teodor Spæren
existing pipelines which rely on the Flink runner saving them from not following the immutability guarantee. I see this as a small loss as they are relying on an implementation detail of the Flink runner. I hope I have explained this adequately and eagerly away any feedback :) Best regards, Teodor

Re: Design rational behind copying via serializing in flink runner

2020-09-06 Thread Teodor Spæren
%40%3Cdev.beam.apache.org%3E On Mon, Aug 31, 2020 at 11:14 AM Teodor Spæren wrote: Hey! First time posting to a mailing list, hope I did it correctly :) I'm writing a master thesis at the University of Oslo and right now I'm looking at the performance overhead of using Beam with the Flink

Design rational behind copying via serializing in flink runner

2020-08-31 Thread Teodor Spæren
it. Best regards, Teodor Spæren [1]: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/CoderUtils.java#L140 [2]: https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/runners/flink/1.8/src/main/java/org/apache/beam/runners