Re: [EXTERNAL] Collect And Deploy Playground Examples - GH action is failing / flaky

2022-03-21 Thread Artur Khanin
Hi all,

Thank you for bringing this up! I filed a ticket 
BEAM-14140<https://issues.apache.org/jira/browse/BEAM-14140>, and we started 
working on this.
We want to approach it by gathering the list of changed examples and if there 
is any — check them that they aren’t broken. Does it seem for you  the right 
way to solve it?

Thanks,
Artur Khanin

On 21 Mar 2022, at 23:57, Ahmet Altay 
mailto:al...@google.com>> wrote:



On Mon, Mar 21, 2022 at 12:28 PM Robert Bradshaw 
mailto:rober...@google.com>> wrote:
On Mon, Mar 21, 2022 at 11:56 AM Ahmet Altay 
mailto:al...@google.com>> wrote:
>
> On Mon, Mar 21, 2022 at 8:57 AM Pablo Estrada 
> mailto:pabl...@google.com>> wrote:
>>
>> Artur, Ilya, do you know why this action is perma-red?
>
> Thank you for looking folks. And if there is not one, could you also please 
> file a jira issue?
>
>> Best
>> -P.
>>
>> On Mon, Mar 21, 2022 at 8:39 AM Jack McCluskey 
>> mailto:jrmcclus...@google.com>> wrote:
>>>
>>> Does this action need to run on every single PR? As far as I can tell it 
>>> isn't saying anything about most PRs being run against it, pass or fail.
>
> I agree. Probably does not need to run on every PR. It could be a scheduled 
> action instead.

This would drastically reduce its usefulness in determining
whether/which PR brakes it. I would remove it as a precommit iff it's
the kind of thing that we would not expect the typical PR to influence
(e.g. could it be triggered on changes influencing a subdirectory).

Of course the short term option of moving/disabling it until it
provides an actual signal (i.e. it's green unless there are actual
regressions) makes sense.

My _understanding_: This action is used for detecting any new examples added to 
the repo and tags those for inclusion in the beam playground. If that is 
correct a post commit might be a better option. A scheduled GH action would be 
similar to a post commit.


>>> On Fri, Mar 18, 2022 at 11:06 PM Ahmet Altay 
>>> mailto:al...@google.com>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> This GH action seems to be recently added and permanently failing [1]. Is 
>>>> there a JIRA for addressing this? I assume not, but just checking, should 
>>>> it be blocking merges?
>>>>
>>>> Thank you!
>>>> Ahmet
>>>>
>>>> [1] 
>>>> https://github.com/apache/beam/actions/workflows/playground_deploy_examples.yml



Re: Question about E2E tests for pipelines

2020-11-26 Thread Artur Khanin
Thank you for the information and links, Alexey! We will try to follow this 
approach.

On 25 Nov 2020, at 21:27, Alexey Romanenko 
mailto:aromanenko@gmail.com>> wrote:

For Kafka testing, there is a Kafka IT [1] that runs on Jenkins [2]. It 
leverages a real Kafka cluster that runs on k8s. So, probably you can follow 
the similar approach.

In the same time, we fake Kafka consumer and its output for KafkaIO unit tests.

[1] 
https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOIT.java
[2] 
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_KafkaIO_IT.groovy


On 25 Nov 2020, at 13:05, Artur Khanin 
mailto:artur.kha...@akvelon.com>> wrote:

Hi Devs,

We are finalizing this PR<https://github.com/apache/beam/pull/13112> with a 
pipeline that reads from Kafka and writes to Pub/Sub without any 
transformations in between. We would like to implement e2e tests where we 
create and execute a pipeline, but we haven't found much information and 
relevant examples about it.How exactly should we implement such kind of tests? 
Can we mock somehow Kafka and Pub/Sub or maybe can we set them up using some 
test environment?

Any information and hints will be greatly appreciated!

Thanks,
Artur Khanin
Akvelon, Inc





Question about E2E tests for pipelines

2020-11-25 Thread Artur Khanin
Hi Devs,

We are finalizing this PR<https://github.com/apache/beam/pull/13112> with a 
pipeline that reads from Kafka and writes to Pub/Sub without any 
transformations in between. We would like to implement e2e tests where we 
create and execute a pipeline, but we haven't found much information and 
relevant examples about it.How exactly should we implement such kind of tests? 
Can we mock somehow Kafka and Pub/Sub or maybe can we set them up using some 
test environment?

Any information and hints will be greatly appreciated!

Thanks,
Artur Khanin
Akvelon, Inc



Proposal: Beam Template-like Example to protect sensitive data

2020-11-23 Thread Artur Khanin
Hi Community!

Some users may want to protect their sensitive data using tokenization.
We propose to create a Beam example template that will demonstrate Beam 
transform to protect sensitive data using tokenization. In our example, we will 
use an external service for the data tokenization.

At a high level, a pipeline that will:

  *   support batch (GCS) and streaming (Pub/Sub) input sources
  *   tokenize sensitive data via external REST service - we are about to use 
Protegrity
  *   output tokenized data into BigQuery or BigTable


I created JIRA ticket 
BEAM-11322<https://issues.apache.org/jira/browse/BEAM-11322> to describe this 
proposal and capture feedback. More details and the proposed design are 
available in the design 
doc<https://docs.google.com/document/d/1fnsUfGpCx8A_MBchPRvlm4gU0Ai5EQNSiZS1mg_A_zg/edit?usp=sharing>.

I welcome community feedback and comments regarding this Beam data tokenization 
template proposal

Thanks,
Artur Khanin
Akvelon, Inc



Question about saving data to use across runner's instances

2020-11-15 Thread Artur Khanin
Hi all,

I am designing a Dataflow pipeline in Java that has to:

  *   Read a file (it may be pretty large) during initialization and then store 
it in some sort of shared memory
  *   Periodically update this file
  *   Make this file available to read across all runner's instances
  *   Persist this file in cases of restarts/crashes/scale-up/scale down

I found some information about stateful processing in Beam using Stateful 
DoFn<https://beam.apache.org/blog/stateful-processing/>. Is it an appropriate 
way to handle such functionality, or is there a better approach for it?

Any help or information is very appreciated!

Thanks,
Artur Khanin
Akvelon, Inc.



Contributor permission for Beam Jira tickets

2020-10-02 Thread Artur Khanin
Hi,

This is Artur from Akvelon. Right now I am working on Dataflow templates in 
Apache Beam. Can someone add me as a contributor for Beam's Jira issue tracker? 
I would like to create/assign tickets for my work.

My Jira ID: Artur.Khanin

Thanks,
Artur Khanin