Question about E2E tests for pipelines

2020-11-25 Thread Artur Khanin
Hi Devs, We are finalizing this PR with a pipeline that reads from Kafka and writes to Pub/Sub without any transformations in between. We would like to implement e2e tests where we create and execute a pipeline, but we haven't found much information

Re: Question about E2E tests for pipelines

2020-11-25 Thread Alexey Romanenko
For Kafka testing, there is a Kafka IT [1] that runs on Jenkins [2]. It leverages a real Kafka cluster that runs on k8s. So, probably you can follow the similar approach. In the same time, we fake Kafka consumer and its output for KafkaIO unit tests. [1]

Re: Documentation for Cross-Language Transforms

2020-11-25 Thread Alexey Romanenko
Great job, it should be very helpful for users! Just a minor note - it would be great to add an example of how to finally run a cross-language pipeline with Portable Runner since, iirc, it was supposed to pass some additional arguments, like “--experiments=beam_fn_api”. > On 21 Nov 2020, at

Re: PTransform Annotations Proposal

2020-11-25 Thread Robert Burke
It sounds like we've come to the position that non-correctness affecting Ptransform Annotations are valuable at both leaf and composite levels, and don't remove the potential need for a similar feature on Environments, to handle physical concerns equirements for worker processes to have (such as

Re: PTransform Annotations Proposal

2020-11-25 Thread Robert Burke
Hmmm. Fair. I'm mostly concerned about the pathological case where we end up with a distinct Environment per transform, but there are likely practical cases where that's reasonable (High mem to GPU to TPU, to ARM) On Wed, Nov 25, 2020, 10:42 AM Robert Bradshaw wrote: > I'd like to continue

Query regarding Array_Agg impl

2020-11-25 Thread Sonam Ramchand
Hi Devs, I am trying to implement Array_Agg( https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg ) for Beam SQL ZetaSQL dialect, as CombineFn. Rough Pseudocode: * public static class ArrayAgg extends CombineFn { //todo }* But then I came to know we cannot

Re: PTransform Annotations Proposal

2020-11-25 Thread Robert Bradshaw
On Wed, Nov 25, 2020 at 10:15 AM Robert Burke wrote: > > It sounds like we've come to the position that non-correctness affecting > Ptransform Annotations are valuable at both leaf and composite levels, and > don't remove the potential need for a similar feature on Environments, to > handle

Re: Implementing an IO Connector for Debezium

2020-11-25 Thread Boyuan Zhang
+dev Hi Bashir, Most recently we are recommending to use Splittable DoFn[1] to build new IO connectors. We have several examples for that in our codebase: Java examples: - Kafka

Re: PTransform Annotations Proposal

2020-11-25 Thread Robert Burke
Mostly because perfect is the enemy of good enough. We have a proposal, we have clear boundaries for it. It's fine if the discussion continues, but I see no evidence of concerns that should prevent starting an implementation, because it seems we'll need both anyway. On Wed, Nov 25, 2020, 10:25 AM

Re: PTransform Annotations Proposal

2020-11-25 Thread Robert Bradshaw
I'd like to continue the discussion *and* see an implementation for the part we've settled on. I was asking why not have "every distinct physical concern means a distinct environment?" On Wed, Nov 25, 2020 at 10:38 AM Robert Burke wrote: > > Mostly because perfect is the enemy of good enough. We

Docker Development Environment

2020-11-25 Thread Sam Rohde
Hi All, I got tired of my local dev environment being ruined by updates so I made a container for Apache Beam development work. What this does is create a Docker container from the Ubuntu Groovy image and load it up with all the necessary libraries/utilities for Apache Beam development. Then I

Re: Docker Development Environment

2020-11-25 Thread Ahmet Altay
Thank you for doing this. I have seen a few related PRs. Connecting them here in case these efforts could be combined: - https://github.com/apache/beam/pull/12837 (/cc +Omar Ismail ) - https://github.com/apache/beam/pull/13308 Ahmet On Wed, Nov 25, 2020 at 2:53 PM Sam Rohde wrote: > Hi All,

Re: Gradle Build issue

2020-11-25 Thread Sonam Ramchand
Thanks Michal! It helped. On Wed, Nov 25, 2020 at 1:23 PM Michał Walenia wrote: > Hi, > are you using your local installation of Gradle or the wrapper supplied in > the repo? If you're running the task with `gradle` command, try using > `./gradlew` instead, this will use the wrapper. > > Have a

Re: Gradle Build issue

2020-11-25 Thread Michał Walenia
Hi, are you using your local installation of Gradle or the wrapper supplied in the repo? If you're running the task with `gradle` command, try using `./gradlew` instead, this will use the wrapper. Have a good day, Michal On Wed, Nov 25, 2020 at 8:44 AM Sonam Ramchand <

Re: Contributor permissions for Beam Jira

2020-11-25 Thread Alexey Romanenko
Hello Yuhong, I added you to Contributors list. Welcome to Beam! Regards, Alexey > On 24 Nov 2020, at 20:16, 成雨虹 wrote: > > Hi Beam, > I am working at Samza Team at Linkedin and I would like to contribute to > Samza Runner in Beam. My apache username is YC. Could I please have the >

Help measuring upcoming performance increase in flink runner on production systems

2020-11-25 Thread Teodor Spæren
Hey! My name is Teodor Spæren and I'm writing a master thesis investigating the performance overhead of using Beam instead of using the underlying systems directly. My focus has been on Flink and I've made a discovery about some unnecessary copying between operators in the Flink