Re: [EXTERNAL] Collect And Deploy Playground Examples - GH action is failing / flaky
Hi all, Thank you for bringing this up! I filed a ticket BEAM-14140<https://issues.apache.org/jira/browse/BEAM-14140>, and we started working on this. We want to approach it by gathering the list of changed examples and if there is any — check them that they aren’t broken. Does it seem for you the right way to solve it? Thanks, Artur Khanin On 21 Mar 2022, at 23:57, Ahmet Altay mailto:al...@google.com>> wrote: On Mon, Mar 21, 2022 at 12:28 PM Robert Bradshaw mailto:rober...@google.com>> wrote: On Mon, Mar 21, 2022 at 11:56 AM Ahmet Altay mailto:al...@google.com>> wrote: > > On Mon, Mar 21, 2022 at 8:57 AM Pablo Estrada > mailto:pabl...@google.com>> wrote: >> >> Artur, Ilya, do you know why this action is perma-red? > > Thank you for looking folks. And if there is not one, could you also please > file a jira issue? > >> Best >> -P. >> >> On Mon, Mar 21, 2022 at 8:39 AM Jack McCluskey >> mailto:jrmcclus...@google.com>> wrote: >>> >>> Does this action need to run on every single PR? As far as I can tell it >>> isn't saying anything about most PRs being run against it, pass or fail. > > I agree. Probably does not need to run on every PR. It could be a scheduled > action instead. This would drastically reduce its usefulness in determining whether/which PR brakes it. I would remove it as a precommit iff it's the kind of thing that we would not expect the typical PR to influence (e.g. could it be triggered on changes influencing a subdirectory). Of course the short term option of moving/disabling it until it provides an actual signal (i.e. it's green unless there are actual regressions) makes sense. My _understanding_: This action is used for detecting any new examples added to the repo and tags those for inclusion in the beam playground. If that is correct a post commit might be a better option. A scheduled GH action would be similar to a post commit. >>> On Fri, Mar 18, 2022 at 11:06 PM Ahmet Altay >>> mailto:al...@google.com>> wrote: >>>> >>>> Hi all, >>>> >>>> This GH action seems to be recently added and permanently failing [1]. Is >>>> there a JIRA for addressing this? I assume not, but just checking, should >>>> it be blocking merges? >>>> >>>> Thank you! >>>> Ahmet >>>> >>>> [1] >>>> https://github.com/apache/beam/actions/workflows/playground_deploy_examples.yml
Re: Question about E2E tests for pipelines
Thank you for the information and links, Alexey! We will try to follow this approach. On 25 Nov 2020, at 21:27, Alexey Romanenko mailto:aromanenko@gmail.com>> wrote: For Kafka testing, there is a Kafka IT [1] that runs on Jenkins [2]. It leverages a real Kafka cluster that runs on k8s. So, probably you can follow the similar approach. In the same time, we fake Kafka consumer and its output for KafkaIO unit tests. [1] https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOIT.java [2] https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_KafkaIO_IT.groovy On 25 Nov 2020, at 13:05, Artur Khanin mailto:artur.kha...@akvelon.com>> wrote: Hi Devs, We are finalizing this PR<https://github.com/apache/beam/pull/13112> with a pipeline that reads from Kafka and writes to Pub/Sub without any transformations in between. We would like to implement e2e tests where we create and execute a pipeline, but we haven't found much information and relevant examples about it.How exactly should we implement such kind of tests? Can we mock somehow Kafka and Pub/Sub or maybe can we set them up using some test environment? Any information and hints will be greatly appreciated! Thanks, Artur Khanin Akvelon, Inc
Question about E2E tests for pipelines
Hi Devs, We are finalizing this PR<https://github.com/apache/beam/pull/13112> with a pipeline that reads from Kafka and writes to Pub/Sub without any transformations in between. We would like to implement e2e tests where we create and execute a pipeline, but we haven't found much information and relevant examples about it.How exactly should we implement such kind of tests? Can we mock somehow Kafka and Pub/Sub or maybe can we set them up using some test environment? Any information and hints will be greatly appreciated! Thanks, Artur Khanin Akvelon, Inc
Proposal: Beam Template-like Example to protect sensitive data
Hi Community! Some users may want to protect their sensitive data using tokenization. We propose to create a Beam example template that will demonstrate Beam transform to protect sensitive data using tokenization. In our example, we will use an external service for the data tokenization. At a high level, a pipeline that will: * support batch (GCS) and streaming (Pub/Sub) input sources * tokenize sensitive data via external REST service - we are about to use Protegrity * output tokenized data into BigQuery or BigTable I created JIRA ticket BEAM-11322<https://issues.apache.org/jira/browse/BEAM-11322> to describe this proposal and capture feedback. More details and the proposed design are available in the design doc<https://docs.google.com/document/d/1fnsUfGpCx8A_MBchPRvlm4gU0Ai5EQNSiZS1mg_A_zg/edit?usp=sharing>. I welcome community feedback and comments regarding this Beam data tokenization template proposal Thanks, Artur Khanin Akvelon, Inc
Question about saving data to use across runner's instances
Hi all, I am designing a Dataflow pipeline in Java that has to: * Read a file (it may be pretty large) during initialization and then store it in some sort of shared memory * Periodically update this file * Make this file available to read across all runner's instances * Persist this file in cases of restarts/crashes/scale-up/scale down I found some information about stateful processing in Beam using Stateful DoFn<https://beam.apache.org/blog/stateful-processing/>. Is it an appropriate way to handle such functionality, or is there a better approach for it? Any help or information is very appreciated! Thanks, Artur Khanin Akvelon, Inc.
Contributor permission for Beam Jira tickets
Hi, This is Artur from Akvelon. Right now I am working on Dataflow templates in Apache Beam. Can someone add me as a contributor for Beam's Jira issue tracker? I would like to create/assign tickets for my work. My Jira ID: Artur.Khanin Thanks, Artur Khanin