Re: On my activity at the project

2017-01-22 Thread Maximilian Michels
Frances! > > On Tue, 17 Jan 2017 at 18:53 Kenneth Knowles <k...@google.com.invalid> wrote: > >> Great to work with you so far, and looking forward to it in the future. >> Enjoy your time off! >> >> Kenn >> >> On Sat, Jan 14, 2017 at 12:0

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-23 Thread Maximilian Michels
+1 Thanks for looking into it! On 23.11.17 00:25, Lukasz Cwik wrote: > I have noticed that some e-mail addresses (notably @google.com) get > .INVALID suffixed onto it so per...@yyy.com become per...@yyy.com.INVALID > in the From: header. > > I have figured out that this is an issue with the way

Re: Apache Beam Python Wheels Repository

2018-08-15 Thread Maximilian Michels
+1 Travis for building the Python wheels looks fine to me. Many Apache projects use Travis in addition to Jenkins. Apache is also invested in Travis [1] to ensure the build capacity is sufficient. In any case, we could migrate away from Travis if it doesn't work out as expected. We don't have to

Re: Test failures list

2018-08-16 Thread Maximilian Michels
Thank you Mikhail for looking into test failures and compiling the list! > I cannot access this link. Is it publicly accessible? Works for me but it takes a while to show results. > One general question: maybe it's a good idea to assign change > authors/code owners to the issues? Or just reach

Re: Process JobBundleFactory for portable runner

2018-08-16 Thread Maximilian Michels
Makes sense to have an option to run the SDK harness in a non-dockerized environment. I'm in the process of creating a Docker entry point for Flink's JobServer[1]. I suppose you would also prefer to execute that one standalone. We should make sure this is also an option. [1]

Re: Metrics architecture inside the runners

2018-08-16 Thread Maximilian Michels
Hi Etienne, Great overview. Thank you! When do we plan to document Metrics for users? Perhaps I should open a JIRA issue. Cheers, Max On 16.08.18 12:22, Etienne Chauchot wrote: > Hi folks ! > > I've created a page in the new Beam wiki for contributors: > >

Bootstrapping Beam's Job Server

2018-08-20 Thread Maximilian Michels
Hi everyone, I wanted to get your opinion on the Job-Server startup [1] which is part of the portability story. I've created a docker container to bring up Beam's Job Server, which is the entry point for pipeline execution. Generally, this works fine when the backend (Flink in this case) runs

Re: Beam application upgrade on Flink crashes

2018-08-20 Thread Maximilian Michels
AFAIK the serializer used here is the CoderTypeSerializer which may not be recoverable because of changes to the contained Coder (TaggedKvCoder). It doesn't currently have a serialVersionUID, so even small changes could break serialization backwards-compatibility. As of now Beam doesn't offer the

Re: Status of IntelliJ with Gradle

2018-08-20 Thread Maximilian Michels
about gradle > improvements and I just split it in several tickets. Here is the one > concerning the same issue: https://issues.apache.org/jira/browse/BEAM-5176 > > Etienne > > Le lundi 20 août 2018 à 15:51 +0200, Maximilian Michels a écrit : >> Hi Beamers, >> >> It

Re: Metrics architecture inside the runners

2018-08-17 Thread Maximilian Michels
the regular website would be needed. > You're right, please fill a jira. > > Etienne > > Le jeudi 16 août 2018 à 18:24 +0200, Maximilian Michels a écrit : >> Hi Etienne, >> >> Great overview. Thank you! >> >> When do we plan to document Metrics for u

Re: Discussion: Scheduling across runner and SDKHarness in Portability framework

2018-08-17 Thread Maximilian Michels
Hi Ankur, Thanks for looking into this problem. The cause seems to be Flink's pipelined execution mode. It runs multiple tasks in one task slot and produces a deadlock when the pipelined operators schedule the SDK harness DoFns in non-topological order. The problem would be resolved if we

Re: Bootstrapping Beam's Job Server

2018-08-20 Thread Maximilian Michels
Thanks for your suggestions. Please see below. > Option 3) would be to map in the docker binary and socket to allow > the containerized Flink job server to start "sibling" containers on > the host. Do you mean packaging Docker inside the Job Server container and mounting /var/run/docker.sock

Re: Bootstrapping Beam's Job Server

2018-08-21 Thread Maximilian Michels
ture plans to deploy Flink TMs via Kubernetes. Thanks, Thomas [1] https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels <mailto:m...@apache.org>> wrote:

Re: Beam Docs Contributor

2018-08-21 Thread Maximilian Michels
That sounds great, Rose. Welcome! On 21.08.18 09:21, Etienne Chauchot wrote: > Welcome Rose ! > > Etienne > > Le lundi 30 juillet 2018 à 10:10 -0700, Thomas Weise a écrit : >> Welcome Rose, and looking forward to the docs update! >> >> On Mon, Jul 30, 2018 at 9:15 AM Henning Rohde >

Re: Process JobBundleFactory for portable runner

2018-08-21 Thread Maximilian Michels
For reference, here is corresponding JIRA issue for this thread: https://issues.apache.org/jira/browse/BEAM-5187 On 16.08.18 11:15, Maximilian Michels wrote: Makes sense to have an option to run the SDK harness in a non-dockerized environment. I'm in the process of creating a Docker entry

Re: Status of IntelliJ with Gradle

2018-08-22 Thread Maximilian Michels
iml beforehand to add the vendored jar file as the top dependency (jar never appears in the modules dependencies) On Mon, Aug 20, 2018 at 8:36 AM Maximilian Michels <mailto:m...@apache.org>> wrote: Thank you Etienne for opening the issue. Anyone else having problems with t

Re: Bootstrapping Beam's Job Server

2018-08-23 Thread Maximilian Michels
ger worker creation > (deletion?), passing the requisite parameters (e.g. the fn api > endpoints). Aren't you making up more features now? :) Couldn't this be also handled by the shell script? On 23.08.18 14:13, Robert Bradshaw wrote: On Thu, Aug 23, 2018 at 1:54 PM Maximilian Michels wro

Re: [Proposal] Track non-code contributions in Jira

2018-08-24 Thread Maximilian Michels
+1 Code is just one part of a successful open-source project. As long as the tasks are properly labelled and actionable, I think it works to put them into JIRA. On 24.08.18 15:09, Matthias Baetens wrote: I fully agree and think it is a great idea. I think that, next to visibility and

Re: [Perk] Sharing the love for Flink Forward

2018-08-27 Thread Maximilian Michels
Just wanted to chime in here and say that Flink Forward is a great conference. You get to meet lots of people from the Flink community from all over the world, committers as well as end users. There are awesome talks as well. Plus, you get to travel to Berlin which, if you haven't been, I

Re: Bootstrapping Beam's Job Server

2018-08-27 Thread Maximilian Michels
figuration. On 23.08.18 17:07, Robert Bradshaw wrote: On Thu, Aug 23, 2018 at 3:47 PM Maximilian Michels wrote: > Going down this path may start to get fairly involved, with an almost > endless list of features that could be requested. Instead, I would > suggest we keep process-based ex

Re: Bootstrapping Beam's Job Server

2018-08-27 Thread Maximilian Michels
endpoints). This could be implemented as a script that goes and makes the call and exits, but I think this would be common enough it'd be worth building in, and also useful enough for testing that it should be very lightweight. On Mon, Aug 27, 2018 at 10:51 AM Maximilian Michels wrote: Robert, just t

Re: Process JobBundleFactory for portable runner

2018-08-27 Thread Maximilian Michels
onfigured, additional steps such as virtualenv activate or setting of other environment variables can be included as well. On Thu, Aug 23, 2018 at 5:15 AM Maximilian Michels mailto:m...@apache.org>> wrote: Just to recap:  From this and the other thread ("Boot

Re: Process JobBundleFactory for portable runner

2018-08-23 Thread Maximilian Michels
dea where the SDK/user starts the SDK harnesses instead of the runner. Each runner may not support all types of environments. Henning On Tue, Aug 21, 2018 at 2:52 AM Maximilian Michels mailto:m...@apache.org>> wrote:

Re: Bootstrapping Beam's Job Server

2018-08-23 Thread Maximilian Michels
enabling docker in their clusters. On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels mailto:m...@apache.org>> wrote: > > Thanks Henning and Thomas. It looks like > > a) we want to keep the Docker Job Server Docker container and

Re: Process JobBundleFactory for portable runner

2018-08-23 Thread Maximilian Michels
y add the     external idea where the SDK/user starts the SDK harnesses     instead of the runner. Each runner may not support all types     of environments.     Henning     On Tue, Aug 21, 2018 at 2:52 AM Maximilian Michels     mailto:m...@apache.org>&

Status of IntelliJ with Gradle

2018-08-20 Thread Maximilian Michels
Hi Beamers, It's great to see the Beam build system overhauled. Thank you for all the hard work. That said, I've just started contributing to Beam again and I feel really stupid for not having a fully-functional IDE. I've closely followed the IntelliJ/Gradle instructions [1]. In the terminal

Status of IntelliJ with Gradle

2018-08-20 Thread Maximilian Michels
Hi Beamers, It's great to see the Beam build system overhauled. Thank you for all the hard work. That said, I've just started contributing to Beam again and I feel really stupid for not having a fully-functional IDE. I've closely followed the IntelliJ/Gradle instructions [1]. In the terminal

Re: Status of IntelliJ with Gradle

2018-08-20 Thread Maximilian Michels
Sorry, please disregard this duplicate mail. The Apache mail relay was flaky and my client doesn't seem to handle it particularly well. On 20.08.18 15:51, Maximilian Michels wrote: > Hi Beamers, > > It's great to see the Beam build system overhauled. Thank you for all > t

Re: Status of IntelliJ with Gradle

2018-08-30 Thread Maximilian Michels
ult, I have to run and debug only using gradle for now. Thanks, Xinyu On Wed, Aug 22, 2018 at 1:45 AM, Maximilian Michels wrote: Thanks Lukasz. I also found that I can never fix all import errors by manually adding jars to the IntelliJ library list. It is also not a good solution because it bre

Re: Python 3: final step

2018-09-07 Thread Maximilian Michels
This has been requested multiple times. Thanks for working on the Python 3 story. Let me know if I can help out in any way! On 05.09.18 19:01, Valentyn Tymofieiev wrote: This is awesome! Kudos to Robbe and Matthias who have been pushing this forward! On Wed, Sep 5, 2018 at 9:45 AM Charles

Re: [NEW CONTRIBUTOR] ElasticsearchIO now supports Elasticsearch v6.x

2018-09-07 Thread Maximilian Michels
Well done. Thank you, Dat! On 06.09.18 22:47, Trần Thành Đạt wrote: Thank you. Etienne Chauchot and Tim Robertson helped me a lot to get familiar with Beam code. On Fri, Sep 7, 2018 at 2:59 AM Thomas Weise > wrote: Support for Elastic 6.x is really good to have.

Re: PR/6343: Adding support for MustFollow

2018-09-10 Thread Maximilian Michels
This is a great idea but I share Lukasz' doubts about this being a universal solution for awaiting some action in a pipeline. I wonder, wouldn't it work to not pass in a PCollection, but instead wrap a DoFn which internally ensures the correct triggering behavior? All runners which correctly

Re: SplittableDoFn

2018-09-10 Thread Maximilian Michels
Thanks for moving forward with this, Lukasz! Unfortunately, can't make it on Friday but I'll sync with somebody on the call (e.g. Ryan) about your discussion. On 08.09.18 02:00, Lukasz Cwik wrote: Thanks for everyone who wanted to fill out the doodle poll. The most popular time was Friday

Re: [FYI] Paper of Building Beam Runner for IBM Streams

2018-09-10 Thread Maximilian Michels
Excellent write-up. Thank you! On 09.09.18 20:43, Jean-Baptiste Onofré wrote: Good idea. It could also help people who wants to create runners. Regards JB On 09/09/2018 13:00, Manu Zhang wrote: Hi all, I've spent the weekend reading Challenges and Experiences in Building an Efficient Apache

Re: [Call for items] September Beam Newsletter

2018-09-10 Thread Maximilian Michels
Good stuff! Left some items for the Flink Runner. On 08.09.18 02:14, Rose Nguyen wrote: *bump* Celebrate the weekend by sharing with the community your talks, contributions, plans, etc! On Wed, Sep 5, 2018 at 10:25 AM Rose Nguyen > wrote: Hi Beamers:

Re: PTransforms and Fusion

2018-09-10 Thread Maximilian Michels
A) What should we do with these "empty" PTransforms? We can't translate them, so dropping them seems the most reasonable choice. Should we throw an error/warning to make the user aware of this? Otherwise might be unexpected for the user. A3) Handle the "empty" PTransform case within all of

Re: builds.apache.org refused connections since last night

2018-08-31 Thread Maximilian Michels
Jenkins is up again! (woho!) On 30.08.18 20:23, Thomas Weise wrote: I would be concerned with multiple folks running the Jekyll build locally to end up with inconsistent results. But if Jenkins stays down for longer, then maybe one of us can be the Jenkins substitute :) On Thu, Aug 30, 2018

Re: Beam Schemas: current status

2018-08-31 Thread Maximilian Michels
also assume that there's a default constructor).  I can remove this restriction if there is an appropriate constructor or builder interface that lets us construct the object directly. Reuven On Thu, Aug 30, 2018 at 6:51 AM Maximilian Michels <mailto:m...@apache.org>> wrote:

Re: Beam Schemas: current status

2018-08-31 Thread Maximilian Michels
this should only do its magic if there is only one possible way to feed data to the constructor. That's why a dedicated interface would be the easier and safer way to opt-in. On 31.08.18 11:27, Robert Bradshaw wrote: On Fri, Aug 31, 2018 at 11:22 AM Maximilian Michels <mailto:m...@apache.org>&

Re: Gradle Races in beam-examples-java, beam-runners-apex

2018-09-11 Thread Maximilian Michels
Do we have inotifywait available on Travis and could set it up to log concurrent access to the relevant Jar files? On 10.09.18 22:41, Lukasz Cwik wrote: I had originally suggested to use some Linux kernel tooling such as inotifywait[1] to watch what is happening. It is likely that we have

[Discuss] Upgrade story for Beam's execution engines

2018-09-11 Thread Maximilian Michels
Hi Beamers, In the light of the discussion about Beam LTS releases, I'd like to kick off a thread about how often we upgrade the execution engine of each Runner. By upgrade, I mean major/minor versions which typically break the binary compatibility of Beam pipelines. For the Flink Runner,

Re: Is there any way to ask the runner to call finalizeCheckpoint() method before it closed the Reader?

2018-10-05 Thread Maximilian Michels
connection) is closed. So it makes no sense to call the finalizeCheckpoint() method after closed the Reader. On Fri, Oct 5, 2018 at 9:01 PM Maximilian Michels <mailto:m...@apache.org>> wrote: Hi, Not sure whether I'm a guru but I'll try to answer your question ;) > I

Beam Summit community feedback

2018-10-05 Thread Maximilian Michels
Hi, What do you think about collecting some of the feedback from the community at Beam Summit last week? Here's what I've come across: * The Kubernetes / Docker Story Multiple users reported that they would like a Beam-Kubernetes story. What is the best way to deploy Beam with Kubernetes?

Re: Portable Flink runner: Generator source for testing

2018-10-08 Thread Maximilian Michels
(which still is really cool). Is this correct or am I missing something? Łukasz pt., 5 paź 2018 o 14:04 Maximilian Michels <mailto:m...@apache.org>> napisał(a): Thanks for sharing your setup. You're right that we need timers to continuously ingest data to the testing pipeline.

Re: Is there any way to ask the runner to call finalizeCheckpoint() method before it closed the Reader?

2018-10-08 Thread Maximilian Michels
documents, thanks! On Fri, Oct 5, 2018 at 11:01 PM Maximilian Michels <mailto:m...@apache.org>> wrote: Restoring from a checkpoint is something different. You asked about acking pending CheckpointMarks. If you look in the PubsubUnboundedSource, it doesn't close t

Re: [DISCUSS] - Separate JIRA notifications to a new mailing list

2018-10-11 Thread Maximilian Michels
+1 I guess most people have already filters in place to separate commits and JIRA issues. JIRA really has nothing to do in the commits list. On 11.10.18 15:53, Kenneth Knowles wrote: +1 I've suggested the same. Canonical. On Thu, Oct 11, 2018, 06:19 Thomas Weise >

Re: Beam Samza Runner status update

2018-10-12 Thread Maximilian Michels
Thanks for the updating, Xinyu and Hai! Great to see another Running emerging :) I'm on the FlinkRunner. Looking forward to working together with you to make the Beam Runners even better. Particularly, we should sync on the portability, as some things are still to be fleshed out. In Flink, we

Re: [DISCUSS] Beam public roadmap

2018-10-12 Thread Maximilian Michels
Great idea, Kenn! How about putting the roadmap in the Confluent wiki? We can link the page from the web site. The timeline should not be too specific but should give users an idea of what to expect. On 10.10.18 22:43, Romain Manni-Bucau wrote: What about a link in the menu. It should

Re: [DISCUSS] - Separate JIRA notifications to a new mailing list

2018-10-15 Thread Maximilian Michels
tt Wegner mailto:sc...@apache.org> wrote: +1, commits@ is too noisy to be useful currently. On Thu, Oct 11, 2018 at 8:04 AM Maximilian Michels mailto:m...@apache.org>> wrote:

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-15 Thread Maximilian Michels
I agree that the current approach breaks the pipeline options contract because "unknown" options get parsed in the same way as options which have been defined by the user. I'm not sure the `experiments` flag works for us. AFAIK it only allows true/false flags. We want to pass all types of

Re: [DISCUSS] Separate Jenkins notifications to a new mailing list

2018-10-16 Thread Maximilian Michels
+1 I can switch of all my filters then, and people new here will be less overwhelmed by email. On 16.10.18 12:46, Alexey Romanenko wrote: +1 On 16 Oct 2018, at 00:02, Chamikara Jayalath > wrote: +1 for new lists. Thanks, Cham On Mon, Oct 15, 2018 at 12:09 PM

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-16 Thread Maximilian Michels
>> > SDKs that are starting off wouldn't need to "fetch" options, they could choose to not support runner options or they could choose to pass all options through to the runner blindly. Fetching the options only provides the SDK the

Re: a new contributor

2018-10-22 Thread Maximilian Michels
Hi Heejong, Thanks for introducing yourself! Welcome :) -Max On 19.10.18 21:45, Ankur Goenka wrote: Welcome Heejong! On Fri, Oct 19, 2018 at 12:27 PM Rui Wang > wrote: Welcome! -Rui On Fri, Oct 19, 2018 at 11:55 AM Robin Qiu

Re: [DISCUSS] Move beam_SeedJob notifications to another email address

2018-10-22 Thread Maximilian Michels
Hi Rui, The seed job being broken is sort of a big deal because it prevents updates to our Jenkins jobs. However, it doesn't stop the existing test configurations from running. I haven't found the mails annoying but I'm ok with moving them to the builds@ list. -Max On 22.10.18 11:05, Colm

Re: [ANNOUNCE] New committers, October 2018

2018-10-22 Thread Maximilian Michels
Congrats Ankur and Xinyu! On 19.10.18 21:27, Rui Wang wrote: Congrats and thanks for your contributions! -Rui On Fri, Oct 19, 2018 at 11:55 AM Ahmet Altay > wrote: Congratulations to both of you! :) On Fri, Oct 19, 2018 at 11:52 AM, Robin Qiu

Re: [DISCUSS] Move beam_SeedJob notifications to another email address

2018-10-22 Thread Maximilian Michels
arate job for whether it is run as an actual postcommit or whether it is run against a PR. The seed job has not been split this way. So when someone is testing the seed job on a PR failures look the same as if it is broken on master. Kenn On Mon, Oct 22, 2018 at 2:15 AM Maximilian Michels <

Re: Python docs build error

2018-10-22 Thread Maximilian Michels
Correction for the footnote: [1] https://github.com/apache/beam/pull/6637 On 22.10.18 15:24, Maximilian Michels wrote: Hi Colm, This [1] got merged recently and broke the "docs" target which apparently is not part of our Python PreCommit tests. See the following PR for a

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-22 Thread Maximilian Michels
+1 for publishing vendored Jars independently. It will improve build time and ease IntelliJ integration. Flink also publishes shaded dependencies separately: - https://github.com/apache/flink-shaded - https://issues.apache.org/jira/browse/FLINK-6529 AFAIK their main motivation was to get rid

Re: Python docs build error

2018-10-22 Thread Maximilian Michels
Hi Colm, This [1] got merged recently and broke the "docs" target which apparently is not part of our Python PreCommit tests. See the following PR for a fix: https://github.com/apache/beam/pull/6774 Best, Max [1] https://github.com/apache/beam/pull/6737 On 22.10.18 12:55, Colm O

Re: [ANNOUNCE] New committers & PMC members, Summer 2018 edition

2018-10-17 Thread Maximilian Michels
Great to see the community growing! On 16.10.18 18:20, Scott Wegner wrote: Congrats all! And thanks Kenn and the PMC for recognizing these contributions. On Mon, Oct 15, 2018 at 9:45 AM Kenneth Knowles > wrote: Hi all, Since our last announcement in May, we

Re: [PROPOSAL] allow the users to anticipate the support of features in the targeted runner.

2018-10-17 Thread Maximilian Michels
This is a good idea. It needs to be fleshed out how the capability of a Runner would be visible to the user (apart from the compatibility matrix). A dry-run feature would be useful, i.e. the user can run an inspection on the pipeline to see if it contains any features which are not supported

Re: [PROPOSAL] allow the users to anticipate the support of features in the targeted runner.

2018-10-18 Thread Maximilian Michels
it as a validation plugin in the build system but IMHO it is already too late for the user. So, long story short, I'm more in favor of an IDE plugin or similar coding-time solution. Best Etienne Le mercredi 17 octobre 2018 à 12:11 +0200, Maximilian Michels a écrit : This is a good idea. It needs

Re: [PROPOSAL] allow the users to anticipate the support of features in the targeted runner.

2018-10-18 Thread Maximilian Michels
that they will be able to run their pipelines with a specific Runner. On 17.10.18 15:28, Robert Bradshaw wrote: On Wed, Oct 17, 2018 at 3:17 PM Kenneth Knowles <mailto:k...@apache.org>> wrote: On Wed, Oct 17, 2018 at 3:12 AM Maximilian Michels mailto:m...@apache.org>> wrote: A dry-ru

Re: [Call for items] October Beam Newsletter

2018-10-16 Thread Maximilian Michels
Hi Rose, A bit late but since the newsletter does not seem to be out yet, I added some items for the Portable Flink Runner. Cheers, Max On 08.10.18 18:59, Rose Nguyen wrote: Hi Beamers: So much has been going on that it's time to sync up again in the October Beam Newsletter [1]! :) *Add

Re: Integrating Stateful DoFns from the Python SDK

2018-10-17 Thread Maximilian Michels
ypehints.KV[K, V]) \ | "statefulParDo" >> beam.ParDo(AddIndex()) Do you know a way to make 2) work, i.e. set the KvCoder for the Create? In the first example, the Create runs in a ParDo, in the second example On 17.10.18 15:34, Maximilian Michels wrote: Thanks Robert. I was

Re: Integrating Stateful DoFns from the Python SDK

2018-10-17 Thread Maximilian Michels
relates to a long-standing issue that the coder inference should be moved up into construction, or at least before we pass the graph to the runner.) On Wed, Oct 17, 2018 at 2:52 PM Maximilian Michels <mailto:m...@apache.org>> wrote: Hi everyone, While integrating porta

Integrating Stateful DoFns from the Python SDK

2018-10-17 Thread Maximilian Michels
Hi everyone, While integrating portable state with the FlinkRunner, I hit a problem and wanted to get your opinion. Stateful DoFns require their input to be KV records. The reason for this is that state is isolated by key. The (non-portable) FlinkRunner uses Flink's `keyBy(key)` construct

Re: Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #216

2018-10-23 Thread Maximilian Michels
I don't get the error locally when running: gradle :beam-sdks-python:lintPy27 Seems like there is a different configuration on Jenkins? On 23.10.18 10:16, Apache Jenkins Server wrote: See

Re: Python docs build error

2018-10-23 Thread Maximilian Michels
9 PM Maximilian Michels mailto:m...@apache.org>> wrote: Correction for the footnote: [1] https://github.com/apache/beam/pull/6637 On 22.10.18 15:24, Maximilian Michels wrote: > Hi Colm, > > This [1] got merged recently and

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-23 Thread Maximilian Michels
it seems pretty valuable to start on immediately. And I want to find out if there's a pitfall lurking. Max: what is the story behind having a separate flink-shaded repo? Did that make it easier to manage in some way? Kenn On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <mailt

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-24 Thread Maximilian Michels
build times a lot. It would be great if vendored deps could stay in the main repository. D. On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels mailto:m...@apache.org>> wrote: Looks great, Kenn!

Re: Unbalanced FileIO writes on Flink

2018-10-24 Thread Maximilian Michels
ne I used. On Tue, Oct 23, 2018 at 11:27 AM Maximilian Michels mailto:m...@apache.org>> wrote: Hi Jozef, This does not look like a FlinkRunner related problem, but is caused by the `WriteFiles` shar

Re: Data Preprocessing in Beam

2018-10-24 Thread Maximilian Michels
Welcome Alejandro! Interesting work. The sketching extension looks like a good place for your algorithms. -Max On 23.10.18 19:05, Lukasz Cwik wrote: Arnoud Fournier (afourn...@talend.com ) started by adding a library to support sketching

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-24 Thread Maximilian Michels
I've run WordCount using Quickstart with the FlinkRunner (locally and against a Flink cluster). Would give a +1 but waiting what Kenn finds. -Max On 23.10.18 07:11, Ahmet Altay wrote: On Mon, Oct 22, 2018 at 10:06 PM, Kenneth Knowles > wrote: You two did so

Re: Unbalanced FileIO writes on Flink

2018-10-26 Thread Maximilian Michels
rk? On Fri, Oct 26, 2018 at 11:26 AM Maximilian Michels mailto:m...@apache.org>> wrote: Oh ok, thanks for the pointer. Coming from Flink, the default is that the sharding is determined by the runtime distribution. Indeed, we will have to add an ove

Re: Beam Community Metrics

2018-10-29 Thread Maximilian Michels
Hi Scott, Thanks for sharing the progress. The test metrics are super helpful. I'm particularly looking forward to the PR metrics which could be useful for improving interaction within the community and with new contributors. -Max On 26.10.18 07:36, Scott Wegner wrote: I want to summarize

Re: Growing Beam -- A call for ideas? What is missing? What would be good to see?

2018-10-29 Thread Maximilian Michels
Hi Austin, Great initiative. I think there are already some materials out there but they are not consolidated: Cookbook with examples: https://github.com/apache/beam/tree/master/examples/java/src/main/java/org/apache/beam/examples/cookbook An interactive tutorial would be a great addition,

Re: Data Preprocessing in Beam

2018-10-29 Thread Maximilian Michels
algorithms in Scala? Or could I create a wrapper that interface with the sketching extension? Cheers.On Oct 24, 2018 15:00, Maximilian Michels wrote: Welcome Alejandro! Interesting work. The sketching extension looks like a good place for your algorithms. -Max On 23.10.18 19:05, Lukasz Cwik

Re: Python profiling

2018-10-29 Thread Maximilian Michels
This looks very helpful for debugging performance of portable pipelines. Great work! Enabling local directories for Flink or other portable Runners would be useful for debugging, e.g. per https://issues.apache.org/jira/browse/BEAM-5440 On 26.10.18 18:08, Robert Bradshaw wrote: Now that

Re: Unbalanced FileIO writes on Flink

2018-10-29 Thread Maximilian Michels
s not seem necessary. I don't recall why we made the choice of shard counts required in streaming mode. Perhaps because the bundles were to small (per key?) by default and we wanted to force more grouping? On Fri, Oct 26, 2018 at 3:32 PM Maximilian Michels wrote: Actually, I don't think setting

Accessing keyed state in portable timer callbacks

2018-10-31 Thread Maximilian Michels
Hi, I have a question regarding user state during timer callback in the FnApiDoFnRunner (Java SDK Harness). I've started implementing Timers for the portable Flink Runner. I can register a timer via the timer output collection and fire the timer via the timer input of the SDK Harness. But

Re: Accessing keyed state in portable timer callbacks

2018-11-01 Thread Maximilian Michels
null from currentElement/currentTimer but longer term I think we'll want a different solution. Alternatively, we could collapse currentElement and currentTimer to be currentElementOrTimer which would solve the accessor issue. On Wed, Oct 31, 2018 at 9:50 AM Maximilian Michels m

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-26 Thread Maximilian Michels
; So if there is a blocker it would really be the Spark runner perf changes. Of course, all these except Dataflow are using local instances so may not be representative of larger scale AFAIK. >> >> Kenn >> >> On Wed, Oct 24, 2018 at 9:48 AM Maxim

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-26 Thread Maximilian Michels
     > Is it an error or list of lists or concatenated. Similar >>> >         issues for map types represented via JSON object {...} >>> > >>> >         We can err to be on the safe side

Re: Flink 1.6 Support

2018-10-31 Thread Maximilian Michels
Hi Jins, As Thomas mentioned, the Flink Runner has already been prepared for Flink 1.6, you just have to change the Flink version in the Gradle build file. Of course this is not convenient because you can't fetch this version via Maven Central. So we're planning to release both versions:

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-25 Thread Maximilian Michels
version: 1.0.0 (first version and subsequent versions such as 1.0.1 are only for patch upgrades that fix any shading issues we may have had when producing the vendored jar) On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels mailto:m...@apache.org>> wrote:

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-25 Thread Maximilian Michels
On 25.10.18 19:23, Lukasz Cwik wrote: On Thu, Oct 25, 2018 at 9:59 AM Maximilian Michels <mailto:m...@apache.org>> wrote: Question: How would a user end up with the same shaded dependency twice? The shaded dependencies are transitive dependencies of Beam

Re: New Edit button on beam.apache.org pages

2018-10-25 Thread Maximilian Michels
Cool! I guess the underlying change is that the website can now be edited through the main repository and we don't have to go through "beam-site"? -Max On 25.10.18 12:20, Alexey Romanenko wrote: This is really cool feature! With a tab “Preview changes” it makes documentation updating much

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-11-05 Thread Maximilian Michels
The result shows that there is a demand for an LTS release. +1 for using an existing release. How about six months for the initial LTS release? I think it shouldn't be too long for the first one to give us a chance to make changes to the model. -Max On 02.11.18 17:26, Ahmet Altay wrote:

Re: [ANNOUNCE] New committer announcement, Euphoria edition

2018-11-02 Thread Maximilian Michels
Congrats David! Looking forward to seeing more awesome work on Euphoria/Beam. -Max On 02.11.18 09:23, Ismaël Mejía wrote: Congratulations, and thanks for all the hard work on making Euphoria Beam ready ! On Fri, Nov 2, 2018 at 12:06 AM Scott Wegner wrote: Congrats David! On Thu, Nov 1,

Re: Follow up ideas, to simplify creating MonitoringInfos.

2018-11-02 Thread Maximilian Michels
I was unable to get this to compile and could find no examples of this on the proto github. Would it be helpful to post the compiler output? -Max On 31.10.18 19:19, Lukasz Cwik wrote: I see and don't know how to help you beyond what your already suggesting. From what I remember, maps were

Re: Never get spotless errors with this one weird trick

2018-11-02 Thread Maximilian Michels
Scott just separated the spotless check from the Java unit test precommit job, so you get faster feedback on spotless errors. Nice! +1 for the pre-commit hook. Have it set up. Unfortunately, it doesn't work with the GitHub merge button. Cheers, Max On 02.11.18 09:26, Ismaël Mejía wrote:

Re: Unbalanced FileIO writes on Flink

2018-10-26 Thread Maximilian Michels
u, Oct 25, 2018 at 12:01 PM Maximilian Michels mailto:m...@apache.org>> wrote: I agree it would be nice to keep the current distribution of elements instead of doing a shuffle based on an artificial shard key. Have you tried `withWindowedWrites()`? Also, why

Re: Unbalanced FileIO writes on Flink

2018-10-25 Thread Maximilian Michels
is a worker ID. Is this doable in beam model? On Wed, Oct 24, 2018 at 4:07 PM Maximilian Michels <mailto:m...@apache.org>> wrote: The FlinkRunner uses a hash function (MurmurHash) on each key which places keys somewhere in the hash space. The hash space (2^32) is split

Re: Does anyone have a strong intelliJ setup?

2018-10-19 Thread Maximilian Michels
cwiki.apache.org/confluence/display/BEAM/IntelliJ+Tips [2] https://cwiki.apache.org/confluence/display/BEAM/Eclipse+Tips On Thu, Oct 4, 2018 at 7:43 AM Maximilian Michels mailto:m...@apache.org>> wrote:

Re: Python SDK worker / portable Flink runner performance improvements

2018-10-19 Thread Maximilian Michels
Thanks Thomas, I think it is important to start looking at performance and improved test coverage. While we have the basic functionality, there is still state and timers to be implemented for the Portable FlinkRunner. These two will allow full testing/optimization: State:

Re: Stackoverflow Questions

2018-11-05 Thread Maximilian Michels
Great idea! I'd prefer a daily/weekly digest if possible. On 05.11.18 19:44, Tim Robertson wrote: Thanks for raising this Anton  It would be very easy to forward new SO questions to the user@ list, or a new list if we're worried about the noise. +1 (preference on user@ until there

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-11-07 Thread Maximilian Michels
job server is ready to provide this information and then migrate to the "full" list. This would be an easy path for SDKs to take on. They could "know" of a few well known options, and if they want to support all options, they implement the integration with the job server. On Fri, Oc

Re: [Euphoria] Looking for a reviewer.

2018-11-07 Thread Maximilian Michels
Yes, I'd still like to help out where possible but I missed your mail, David. Feel free to reach out to me via mail/Slack. Or simply mention me on the pull request. I'd leave this one to JB for now but will have a look tomorrow. Cheers, Max On 07.11.18 17:47, Lukasz Cwik wrote: Welcome back

Re: How to use "PortableRunner" in Python SDK?

2018-11-08 Thread Maximilian Michels
is due to my docker installation messed up). On Tue, Nov 6, 2018 at 1:53 AM Maximilian Michels <mailto:m...@apache.org>> wrote: Hi, Please follow https://beam.apache.org/roadmap/portability/#python-on-flink Cheers, Max On 06.11.18 01:14, Ankur Goenka wrote: >

  1   2   3   4   5   6   >