date:20210324

Re: Make KafkaIO usable from Dataflow Template?

2021-03-24 Thread Ahmet Altay

On Wed, Mar 24, 2021 at 10:58 AM Alexey Romanenko 
wrote:

> Robert, could you elaborate a bit why or point me out if it was already
> discussed?
>

I will speak to this from a Dataflow perspective. As Alex pointed out above
Dataflow's flex templates allows for a more flexible (:p) way of creating
templates and is most ways supersedes Dataflow templates. (There is a
comparison here [1]). One advantage is that there is no more a need for
propagating ValueProvider across the code base in order to support
Dataflow's template mechanism.

I might be missing context on how ValueProvider's are used by other runners
and it might still be useful in those cases. My understanding
was ValueProvider was only supported by Dataflow but I might be wrong.

[1]
https://cloud.google.com/dataflow/docs/concepts/dataflow-templates#evaluating-which-template-type-to-use

>
> On 24 Mar 2021, at 00:11, Robert Bradshaw  wrote:
>
> I would encourage flex templates over further proliferation of
> ValueProviders.
>
> On Tue, Mar 23, 2021 at 12:42 PM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> I think you are right - now with SDF support in KafkaIO it should be
>> possible to determine the number of splits in the runtime and support
>> ValueProviders in that way.
>>
>> CC: Boyuan Zhang
>>
>> On 23 Mar 2021, at 18:18, Vincent Marquez 
>> wrote:
>>
>> Hello.  I was looking at how to use the KafkaIO from a Dataflow Template,
>> which requires all configuration options to be ValueProviders, which
>> KafkaIO doesn't support currently.  I saw this old PR:
>> https://github.com/apache/beam/pull/6636/files
>>
>> I believe the reason this was never merged was there wasn't a good way to
>> determine the number of splits to fire up in the UnboundedSource for
>> KafkaIO.  However, now that KafkaIO is based on SplittableDoFn which
>> handles splitting in a dynamic fashion, are there still any blockers to
>> this?
>>
>> Could we change all the KafkaIO parameters to ValueProviders now?
>>
>> *~Vincent*
>>
>>
>>
>

Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Vincent Marquez

*~Vincent*


On Wed, Mar 24, 2021 at 6:07 PM Kenneth Knowles  wrote:

> The reason I was checking out the code is that sometimes a natural thing
> to output would be a summary of what was written. So each chunk of writes
> and the final chunk written in @FinishBundle. This is, for example, what
> SQL engines do (output # of rows written).
>
> You could output both the summary and the full list of written elements to
> different outputs, and users can choose. Outputs that are never consumed
> should be very low or zero cost.n
>
>
I like this approach.  I would much prefer two outputs (one of which is all
elements written) to returning an existential/wildcard PCollection.



> Kenn
>
> On Wed, Mar 24, 2021 at 5:36 PM Robert Bradshaw 
> wrote:
>
>> Yeah, the entire input is not always what is needed, and can generally be
>> achieved via
>>
>> input -> wait(side input of write) -> do something with the input
>>
>> Of course one could also do
>>
>> entire_input_as_output_of_wait -> MapTo(KV.of(null, null)) ->
>> CombineGlobally(TrivialCombineFn)
>>
>> to reduce this to a more minimal set with at least one element per
>> Window.
>>
>> The file writing operations emit the actual files that were written,
>> which can be handy. My suggestion of PCollection was just so that we can
>> emit something usable, and decide exactly what is the most useful is later.
>>
>>
>> On Wed, Mar 24, 2021 at 5:30 PM Reuven Lax  wrote:
>>
>>> I believe that the Wait transform turns this output into a side input,
>>> so outputting the input PCollection might be problematic.
>>>
>>> On Wed, Mar 24, 2021 at 4:49 PM Kenneth Knowles  wrote:
>>>
 Alex's idea sounds good and like what Vincent maybe implemented. I am
 just reading really quickly so sorry if I missed something...

 Checking out the code for the WriteFn I see a big problem:

 @Setup
 public void setup() {
   writer = new Mutator<>(spec, Mapper::saveAsync, "writes");
 }

 @ProcessElement
   public void processElement(ProcessContext c) throws
 ExecutionException, InterruptedException {
   writer.mutate(c.element());
 }

 @Teardown
 public void teardown() throws Exception {
   writer.close();
   writer = null;
 }

 It is only in writer.close() that all async writes are waited on. This
 needs to happen in @FinishBundle.

 Did you discover this when implementing your own Cassandra.Write?

 Until you have waited on the future, you should not output the element
 as "has been written". And you cannot output from the @TearDown method
 which is just for cleaning up resources.

 Am I reading this wrong?

 Kenn

 On Wed, Mar 24, 2021 at 4:35 PM Alex Amato  wrote:

> How about a PCollection containing every element which was
> successfully written?
> Basically the same things which were passed into it.
>
> Then you could act on every element after its been successfully
> written to the sink.
>
> On Wed, Mar 24, 2021 at 3:16 PM Robert Bradshaw 
> wrote:
>
>> On Wed, Mar 24, 2021 at 2:36 PM Ismaël Mejía 
>> wrote:
>>
>>> +dev
>>>
>>> Since we all agree that we should return something different than
>>> PDone the real question is what should we return.
>>>
>>
>> My proposal is that one returns a PCollection that consists,
>> internally, of something contentless like nulls. This is future 
>> compatible
>> with returning something more maningful based on the source source or 
>> write
>> process itself, but at least this would be followable.
>>
>>
>>> As a reminder we had a pretty interesting discussion about this
>>> already in the past but uniformization of our return values has not
>>> happened.
>>> This thread is worth reading for Vincent or anyone who wants to
>>> contribute Write transforms that return.
>>>
>>> https://lists.apache.org/thread.html/d1a4556a1e13a661cce19021926a5d0997fbbfde016d36989cf75a07%40%3Cdev.beam.apache.org%3E
>>
>>
>> Yeah, we should go ahead and finally do something.
>>
>>
>>>
>>> > Returning PDone is an anti-pattern that should be avoided, but
>>> changing it now would be backwards incompatible.
>>>
>>> Periodic reminder most IOs are still Experimental so I suppose it is
>>> worth to the maintainers to judge if the upgrade to return someething
>>> different of PDone is worth, in that case we can deprecate and remove
>>> the previous signature in short time (2 releases was the average for
>>> previous cases).
>>>
>>>
>>> On Wed, Mar 24, 2021 at 10:24 PM Alexey Romanenko
>>>  wrote:
>>> >
>>> > I thought that was said about returning a PCollection of write
>>> results as it’s done in other IOs (as I mentioned as examples) that have
>>>

Re: BEAM-9185 Publish pre-release python artifacts (RCs) to PyPI

2021-03-24 Thread Kenneth Knowles

Caveat here: I am updating the scripts and release process as I work
through 2.29.0. That script is new as of last week.

I would prefer the `choose_rc_commit.sh` script to only choose the commit
and push the RC tag. This way the chosen commit is set regardless of how
many times we might have to try or adjust other steps. I think that
publishing the wheels probably fits better into
`build_release_candidate.sh` since that script reads the tag, builds a
variety of things, and uploads them to various places. I haven't started to
update that script much yet, though. I don't know if there is a reason it
will not work.

Kenn

On Wed, Mar 24, 2021 at 6:12 PM Ahmet Altay  wrote:

> This looks like a good and reasonable approach to me. I have not reviewed
> the PR, but I agree with outlined modifications.
>
> It will be good to get reviews from +Valentyn Tymofieiev
>  (who has lots of experience in this area) and +Kenneth
> Knowles  (who is the current release manager to
> understand how this fits into the release process.)
>
> On Wed, Mar 24, 2021 at 2:24 PM Josias Rico García <
> josias.r...@wizeline.com> wrote:
>
>> Hello devs
>>
>>
>> Me and @Benjamin Gonzalez Delgado  have
>> been working on the solution of Jira Issue BEAM-9185
>> , and we’ve done the
>> following:
>>
>> Background:
>>
>> When executing choose_rc_commit.sh, a commit tag is created, and the
>> automated Github actions “build_wheels” detect the name of the tag that
>> contains the *rc* version and generates both RC and non-RC artifacts.
>> Release Manager can execute build_release_candidate.sh as usual.
>>
>> Modification:
>>
>> At the end of stage 7. Build a release candidate:
>>
>>1.
>>
>> Added step Upload release candidate to PyPI to execute
>>deploy_release_candidate_pypi.sh which:
>>1.
>>
>>   Downloads source distribution and wheels tagged as *rc*.
>>   2.
>>
>>   Deploys release candidate to PyPI
>>   1.
>>
>>  These artifacts will be uploaded to PyPI as a pre-release.
>>  2.
>>
>>  A link will be added as part of the voting.
>>  2.
>>
>>Updated *Release Guide*.
>>
>>
>>
>> Important Notes:
>>
>>
>>
>>-
>>
>> According to the Publishing pre-release artifacts to repositories
>>
>> 
>>discussion thread, this snapshot is not targeted to users.
>>-
>>
>>The voting process will remain the same: over the binaries in svn and
>>not RC.
>>-
>>
>>Use the RC binaries in PyPi as needed: download them with pip install
>>--pre.
>>-
>>
>>It is not necessary to rebuild after the voting.
>>
>>
>> Please let us know any comments or doubts. You can find the PR with the
>> complete changes here: https://github.com/apache/beam/pull/14325.
>>
>> We are pleased to hear from you.
>>
>>
>> --
>>
>> Josias Misael Rico Garcia | WIZELINE
>>
>> Technical Writer
>>
>> josias.r...@wizeline.com
>>
>> Follow us Twitter  | Facebook
>>  | Instagram
>>  | LinkedIn
>> 
>>
>> Share feedback on Clutch 
>>
>>
>>
>>
>>
>>
>>
>>
>> *This email and its contents (including any attachments) are being sent
>> toyou on the condition of confidentiality and may be protected by
>> legalprivilege. Access to this email by anyone other than the intended
>> recipientis unauthorized. If you are not the intended recipient, please
>> immediatelynotify the sender by replying to this message and delete the
>> materialimmediately from your system. Any further use, dissemination,
>> distributionor reproduction of this email is strictly prohibited. Further,
>> norepresentation is made with respect to any content contained in this
>> email.*
>
>

Re: BEAM-9185 Publish pre-release python artifacts (RCs) to PyPI

2021-03-24 Thread Ahmet Altay

This looks like a good and reasonable approach to me. I have not reviewed
the PR, but I agree with outlined modifications.

It will be good to get reviews from +Valentyn Tymofieiev
 (who has lots of experience in this area) and +Kenneth
Knowles  (who is the current release manager to understand
how this fits into the release process.)

On Wed, Mar 24, 2021 at 2:24 PM Josias Rico García 
wrote:

> Hello devs
>
>
> Me and @Benjamin Gonzalez Delgado  have
> been working on the solution of Jira Issue BEAM-9185
> , and we’ve done the
> following:
>
> Background:
>
> When executing choose_rc_commit.sh, a commit tag is created, and the
> automated Github actions “build_wheels” detect the name of the tag that
> contains the *rc* version and generates both RC and non-RC artifacts.
> Release Manager can execute build_release_candidate.sh as usual.
>
> Modification:
>
> At the end of stage 7. Build a release candidate:
>
>1.
>
> Added step Upload release candidate to PyPI to execute
>deploy_release_candidate_pypi.sh which:
>1.
>
>   Downloads source distribution and wheels tagged as *rc*.
>   2.
>
>   Deploys release candidate to PyPI
>   1.
>
>  These artifacts will be uploaded to PyPI as a pre-release.
>  2.
>
>  A link will be added as part of the voting.
>  2.
>
>Updated *Release Guide*.
>
>
>
> Important Notes:
>
>
>
>-
>
> According to the Publishing pre-release artifacts to repositories
>
> 
>discussion thread, this snapshot is not targeted to users.
>-
>
>The voting process will remain the same: over the binaries in svn and
>not RC.
>-
>
>Use the RC binaries in PyPi as needed: download them with pip install
>--pre.
>-
>
>It is not necessary to rebuild after the voting.
>
>
> Please let us know any comments or doubts. You can find the PR with the
> complete changes here: https://github.com/apache/beam/pull/14325.
>
> We are pleased to hear from you.
>
>
> --
>
> Josias Misael Rico Garcia | WIZELINE
>
> Technical Writer
>
> josias.r...@wizeline.com
>
> Follow us Twitter  | Facebook
>  | Instagram
>  | LinkedIn
> 
>
> Share feedback on Clutch 
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*

Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Kenneth Knowles

The reason I was checking out the code is that sometimes a natural thing to
output would be a summary of what was written. So each chunk of writes and
the final chunk written in @FinishBundle. This is, for example, what SQL
engines do (output # of rows written).

You could output both the summary and the full list of written elements to
different outputs, and users can choose. Outputs that are never consumed
should be very low or zero cost.

Kenn

On Wed, Mar 24, 2021 at 5:36 PM Robert Bradshaw  wrote:

> Yeah, the entire input is not always what is needed, and can generally be
> achieved via
>
> input -> wait(side input of write) -> do something with the input
>
> Of course one could also do
>
> entire_input_as_output_of_wait -> MapTo(KV.of(null, null)) ->
> CombineGlobally(TrivialCombineFn)
>
> to reduce this to a more minimal set with at least one element per Window.
>
> The file writing operations emit the actual files that were written, which
> can be handy. My suggestion of PCollection was just so that we can emit
> something usable, and decide exactly what is the most useful is later.
>
>
> On Wed, Mar 24, 2021 at 5:30 PM Reuven Lax  wrote:
>
>> I believe that the Wait transform turns this output into a side input, so
>> outputting the input PCollection might be problematic.
>>
>> On Wed, Mar 24, 2021 at 4:49 PM Kenneth Knowles  wrote:
>>
>>> Alex's idea sounds good and like what Vincent maybe implemented. I am
>>> just reading really quickly so sorry if I missed something...
>>>
>>> Checking out the code for the WriteFn I see a big problem:
>>>
>>> @Setup
>>> public void setup() {
>>>   writer = new Mutator<>(spec, Mapper::saveAsync, "writes");
>>> }
>>>
>>> @ProcessElement
>>>   public void processElement(ProcessContext c) throws
>>> ExecutionException, InterruptedException {
>>>   writer.mutate(c.element());
>>> }
>>>
>>> @Teardown
>>> public void teardown() throws Exception {
>>>   writer.close();
>>>   writer = null;
>>> }
>>>
>>> It is only in writer.close() that all async writes are waited on. This
>>> needs to happen in @FinishBundle.
>>>
>>> Did you discover this when implementing your own Cassandra.Write?
>>>
>>> Until you have waited on the future, you should not output the element
>>> as "has been written". And you cannot output from the @TearDown method
>>> which is just for cleaning up resources.
>>>
>>> Am I reading this wrong?
>>>
>>> Kenn
>>>
>>> On Wed, Mar 24, 2021 at 4:35 PM Alex Amato  wrote:
>>>
 How about a PCollection containing every element which was successfully
 written?
 Basically the same things which were passed into it.

 Then you could act on every element after its been successfully written
 to the sink.

 On Wed, Mar 24, 2021 at 3:16 PM Robert Bradshaw 
 wrote:

> On Wed, Mar 24, 2021 at 2:36 PM Ismaël Mejía 
> wrote:
>
>> +dev
>>
>> Since we all agree that we should return something different than
>> PDone the real question is what should we return.
>>
>
> My proposal is that one returns a PCollection that consists,
> internally, of something contentless like nulls. This is future compatible
> with returning something more maningful based on the source source or 
> write
> process itself, but at least this would be followable.
>
>
>> As a reminder we had a pretty interesting discussion about this
>> already in the past but uniformization of our return values has not
>> happened.
>> This thread is worth reading for Vincent or anyone who wants to
>> contribute Write transforms that return.
>>
>> https://lists.apache.org/thread.html/d1a4556a1e13a661cce19021926a5d0997fbbfde016d36989cf75a07%40%3Cdev.beam.apache.org%3E
>
>
> Yeah, we should go ahead and finally do something.
>
>
>>
>> > Returning PDone is an anti-pattern that should be avoided, but
>> changing it now would be backwards incompatible.
>>
>> Periodic reminder most IOs are still Experimental so I suppose it is
>> worth to the maintainers to judge if the upgrade to return someething
>> different of PDone is worth, in that case we can deprecate and remove
>> the previous signature in short time (2 releases was the average for
>> previous cases).
>>
>>
>> On Wed, Mar 24, 2021 at 10:24 PM Alexey Romanenko
>>  wrote:
>> >
>> > I thought that was said about returning a PCollection of write
>> results as it’s done in other IOs (as I mentioned as examples) that have
>> _additional_ write methods, like “withWriteResults()” etc, that return
>> PTransform<…, PCollection>.
>> > In this case, we keep backwards compatibility and just add new
>> funtionality. Though, we need to follow the same pattern for user API and
>> maybe even naming for this feature across different IOs (like we have for
>> "readAll()” methods).
>> >

Re: BEAM-3713: Moving from nose to pytest

2021-03-24 Thread Ahmet Altay

All PRs look either merged or closed.

+Udi Meiri  might have more information about the
remaining work.

On Wed, Mar 24, 2021 at 5:29 PM Benjamin Gonzalez Delgado <
benjamin.gonza...@wizeline.com> wrote:

> Hi team,
> I am planning to work in BEAM-3713
> , but I see there are
> PRs related to the task.
> Could someone guide me on the work that remains missing regarding the
> migration from nose to pytest?
> Any guidance on this would be appreciated.
>
> Thanks!
> Benjamin
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*

Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Robert Bradshaw

Yeah, the entire input is not always what is needed, and can generally be
achieved via

input -> wait(side input of write) -> do something with the input

Of course one could also do

entire_input_as_output_of_wait -> MapTo(KV.of(null, null)) ->
CombineGlobally(TrivialCombineFn)

to reduce this to a more minimal set with at least one element per Window.

The file writing operations emit the actual files that were written, which
can be handy. My suggestion of PCollection was just so that we can emit
something usable, and decide exactly what is the most useful is later.


On Wed, Mar 24, 2021 at 5:30 PM Reuven Lax  wrote:

> I believe that the Wait transform turns this output into a side input, so
> outputting the input PCollection might be problematic.
>
> On Wed, Mar 24, 2021 at 4:49 PM Kenneth Knowles  wrote:
>
>> Alex's idea sounds good and like what Vincent maybe implemented. I am
>> just reading really quickly so sorry if I missed something...
>>
>> Checking out the code for the WriteFn I see a big problem:
>>
>> @Setup
>> public void setup() {
>>   writer = new Mutator<>(spec, Mapper::saveAsync, "writes");
>> }
>>
>> @ProcessElement
>>   public void processElement(ProcessContext c) throws
>> ExecutionException, InterruptedException {
>>   writer.mutate(c.element());
>> }
>>
>> @Teardown
>> public void teardown() throws Exception {
>>   writer.close();
>>   writer = null;
>> }
>>
>> It is only in writer.close() that all async writes are waited on. This
>> needs to happen in @FinishBundle.
>>
>> Did you discover this when implementing your own Cassandra.Write?
>>
>> Until you have waited on the future, you should not output the element as
>> "has been written". And you cannot output from the @TearDown method which
>> is just for cleaning up resources.
>>
>> Am I reading this wrong?
>>
>> Kenn
>>
>> On Wed, Mar 24, 2021 at 4:35 PM Alex Amato  wrote:
>>
>>> How about a PCollection containing every element which was successfully
>>> written?
>>> Basically the same things which were passed into it.
>>>
>>> Then you could act on every element after its been successfully written
>>> to the sink.
>>>
>>> On Wed, Mar 24, 2021 at 3:16 PM Robert Bradshaw 
>>> wrote:
>>>
 On Wed, Mar 24, 2021 at 2:36 PM Ismaël Mejía  wrote:

> +dev
>
> Since we all agree that we should return something different than
> PDone the real question is what should we return.
>

 My proposal is that one returns a PCollection that consists,
 internally, of something contentless like nulls. This is future compatible
 with returning something more maningful based on the source source or write
 process itself, but at least this would be followable.


> As a reminder we had a pretty interesting discussion about this
> already in the past but uniformization of our return values has not
> happened.
> This thread is worth reading for Vincent or anyone who wants to
> contribute Write transforms that return.
>
> https://lists.apache.org/thread.html/d1a4556a1e13a661cce19021926a5d0997fbbfde016d36989cf75a07%40%3Cdev.beam.apache.org%3E


 Yeah, we should go ahead and finally do something.


>
> > Returning PDone is an anti-pattern that should be avoided, but
> changing it now would be backwards incompatible.
>
> Periodic reminder most IOs are still Experimental so I suppose it is
> worth to the maintainers to judge if the upgrade to return someething
> different of PDone is worth, in that case we can deprecate and remove
> the previous signature in short time (2 releases was the average for
> previous cases).
>
>
> On Wed, Mar 24, 2021 at 10:24 PM Alexey Romanenko
>  wrote:
> >
> > I thought that was said about returning a PCollection of write
> results as it’s done in other IOs (as I mentioned as examples) that have
> _additional_ write methods, like “withWriteResults()” etc, that return
> PTransform<…, PCollection>.
> > In this case, we keep backwards compatibility and just add new
> funtionality. Though, we need to follow the same pattern for user API and
> maybe even naming for this feature across different IOs (like we have for
> "readAll()” methods).
> >
> >  I agree that we have to avoid returning PDone for such cases.
> >
> > On 24 Mar 2021, at 20:05, Robert Bradshaw 
> wrote:
> >
> > Returning PDone is an anti-pattern that should be avoided, but
> changing it now would be backwards incompatible. PRs to add non-PDone
> returning variants (probably as another option to the builders) that
> compose well with Wait, etc. would be welcome.
> >
> > On Wed, Mar 24, 2021 at 11:14 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
> >>
> >> In this way, I think “Wait” PTransform should work for you but, as
> it was mentioned before, it

Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Reuven Lax

I believe that the Wait transform turns this output into a side input, so
outputting the input PCollection might be problematic.

On Wed, Mar 24, 2021 at 4:49 PM Kenneth Knowles  wrote:

> Alex's idea sounds good and like what Vincent maybe implemented. I am just
> reading really quickly so sorry if I missed something...
>
> Checking out the code for the WriteFn I see a big problem:
>
> @Setup
> public void setup() {
>   writer = new Mutator<>(spec, Mapper::saveAsync, "writes");
> }
>
> @ProcessElement
>   public void processElement(ProcessContext c) throws
> ExecutionException, InterruptedException {
>   writer.mutate(c.element());
> }
>
> @Teardown
> public void teardown() throws Exception {
>   writer.close();
>   writer = null;
> }
>
> It is only in writer.close() that all async writes are waited on. This
> needs to happen in @FinishBundle.
>
> Did you discover this when implementing your own Cassandra.Write?
>
> Until you have waited on the future, you should not output the element as
> "has been written". And you cannot output from the @TearDown method which
> is just for cleaning up resources.
>
> Am I reading this wrong?
>
> Kenn
>
> On Wed, Mar 24, 2021 at 4:35 PM Alex Amato  wrote:
>
>> How about a PCollection containing every element which was successfully
>> written?
>> Basically the same things which were passed into it.
>>
>> Then you could act on every element after its been successfully written
>> to the sink.
>>
>> On Wed, Mar 24, 2021 at 3:16 PM Robert Bradshaw 
>> wrote:
>>
>>> On Wed, Mar 24, 2021 at 2:36 PM Ismaël Mejía  wrote:
>>>
 +dev

 Since we all agree that we should return something different than
 PDone the real question is what should we return.

>>>
>>> My proposal is that one returns a PCollection that consists,
>>> internally, of something contentless like nulls. This is future compatible
>>> with returning something more maningful based on the source source or write
>>> process itself, but at least this would be followable.
>>>
>>>
 As a reminder we had a pretty interesting discussion about this
 already in the past but uniformization of our return values has not
 happened.
 This thread is worth reading for Vincent or anyone who wants to
 contribute Write transforms that return.

 https://lists.apache.org/thread.html/d1a4556a1e13a661cce19021926a5d0997fbbfde016d36989cf75a07%40%3Cdev.beam.apache.org%3E
>>>
>>>
>>> Yeah, we should go ahead and finally do something.
>>>
>>>

 > Returning PDone is an anti-pattern that should be avoided, but
 changing it now would be backwards incompatible.

 Periodic reminder most IOs are still Experimental so I suppose it is
 worth to the maintainers to judge if the upgrade to return someething
 different of PDone is worth, in that case we can deprecate and remove
 the previous signature in short time (2 releases was the average for
 previous cases).


 On Wed, Mar 24, 2021 at 10:24 PM Alexey Romanenko
  wrote:
 >
 > I thought that was said about returning a PCollection of write
 results as it’s done in other IOs (as I mentioned as examples) that have
 _additional_ write methods, like “withWriteResults()” etc, that return
 PTransform<…, PCollection>.
 > In this case, we keep backwards compatibility and just add new
 funtionality. Though, we need to follow the same pattern for user API and
 maybe even naming for this feature across different IOs (like we have for
 "readAll()” methods).
 >
 >  I agree that we have to avoid returning PDone for such cases.
 >
 > On 24 Mar 2021, at 20:05, Robert Bradshaw 
 wrote:
 >
 > Returning PDone is an anti-pattern that should be avoided, but
 changing it now would be backwards incompatible. PRs to add non-PDone
 returning variants (probably as another option to the builders) that
 compose well with Wait, etc. would be welcome.
 >
 > On Wed, Mar 24, 2021 at 11:14 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:
 >>
 >> In this way, I think “Wait” PTransform should work for you but, as
 it was mentioned before, it doesn’t work with PDone, only with PCollection
 as a signal.
 >>
 >> Since you already adjusted your own writer for that, it would be
 great to contribute it back to Beam in the way as it was done for other IOs
 (for example, JdbcIO [1] or BigtableIO [2])
 >>
 >> In general, I think we need to have it for all IOs, at least to use
 with “Wait” because this pattern it's quite often required.
 >>
 >> [1]
 https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L1078
 >> [2]

BEAM-3713: Moving from nose to pytest

2021-03-24 Thread Benjamin Gonzalez Delgado

Hi team,
I am planning to work in BEAM-3713
, but I see there are PRs
related to the task.
Could someone guide me on the work that remains missing regarding the
migration from nose to pytest?
Any guidance on this would be appreciated.

Thanks!
Benjamin

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*

BEAM-3304 - Go Trigger RuntimeError question

2021-03-24 Thread Eduardo Barrera

Hello Team,

We're testing an initial and very basic implementation of a trigger. While
testing different triggers like *Always* or *Never* using the
windowed_wordcount.go example and graphx/translate.go, we get this error
message:

RuntimeError: process bundle failed for instruction bundle_147 using plan 1
: while executing Process for Plan[1]:
2: DataSink[S[CoGBK/Write@localhost:63851]]
Coder:W;c9_windowed>!GWC
3: ParDo[beam.addFixedKeyFn] Out:[2]
4: WindowInto[GLO]. Out:3
5: ParDo[main.formatFn] Out:[4]
6: ExtractOutput[stats.sumIntFn] Keyed:false Out:5
7: MergeAccumulators[stats.sumIntFn] Keyed:false Out:6
1: DataSource[S[CombinePerKey/Group/Read@localhost:63851], out]
Coder:W;c8_windowed>!IWC Out:7
caused by:
stream value decode failed
caused by:
invalid varintz encoding for: []
Remote logging shutting down.exit status 1

We believe that this is an error caused by the lack of a trigger
coder/encoder, however, it works with the Default trigger (that's set by
default in graphx/translate.go).

Any guidance on this would be appreciated.

Thanks!
Eduardo

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*

Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Kenneth Knowles

Alex's idea sounds good and like what Vincent maybe implemented. I am just
reading really quickly so sorry if I missed something...

Checking out the code for the WriteFn I see a big problem:

@Setup
public void setup() {
  writer = new Mutator<>(spec, Mapper::saveAsync, "writes");
}

@ProcessElement
  public void processElement(ProcessContext c) throws
ExecutionException, InterruptedException {
  writer.mutate(c.element());
}

@Teardown
public void teardown() throws Exception {
  writer.close();
  writer = null;
}

It is only in writer.close() that all async writes are waited on. This
needs to happen in @FinishBundle.

Did you discover this when implementing your own Cassandra.Write?

Until you have waited on the future, you should not output the element as
"has been written". And you cannot output from the @TearDown method which
is just for cleaning up resources.

Am I reading this wrong?

Kenn

On Wed, Mar 24, 2021 at 4:35 PM Alex Amato  wrote:

> How about a PCollection containing every element which was successfully
> written?
> Basically the same things which were passed into it.
>
> Then you could act on every element after its been successfully written to
> the sink.
>
> On Wed, Mar 24, 2021 at 3:16 PM Robert Bradshaw 
> wrote:
>
>> On Wed, Mar 24, 2021 at 2:36 PM Ismaël Mejía  wrote:
>>
>>> +dev
>>>
>>> Since we all agree that we should return something different than
>>> PDone the real question is what should we return.
>>>
>>
>> My proposal is that one returns a PCollection that consists,
>> internally, of something contentless like nulls. This is future compatible
>> with returning something more maningful based on the source source or write
>> process itself, but at least this would be followable.
>>
>>
>>> As a reminder we had a pretty interesting discussion about this
>>> already in the past but uniformization of our return values has not
>>> happened.
>>> This thread is worth reading for Vincent or anyone who wants to
>>> contribute Write transforms that return.
>>>
>>> https://lists.apache.org/thread.html/d1a4556a1e13a661cce19021926a5d0997fbbfde016d36989cf75a07%40%3Cdev.beam.apache.org%3E
>>
>>
>> Yeah, we should go ahead and finally do something.
>>
>>
>>>
>>> > Returning PDone is an anti-pattern that should be avoided, but
>>> changing it now would be backwards incompatible.
>>>
>>> Periodic reminder most IOs are still Experimental so I suppose it is
>>> worth to the maintainers to judge if the upgrade to return someething
>>> different of PDone is worth, in that case we can deprecate and remove
>>> the previous signature in short time (2 releases was the average for
>>> previous cases).
>>>
>>>
>>> On Wed, Mar 24, 2021 at 10:24 PM Alexey Romanenko
>>>  wrote:
>>> >
>>> > I thought that was said about returning a PCollection of write results
>>> as it’s done in other IOs (as I mentioned as examples) that have
>>> _additional_ write methods, like “withWriteResults()” etc, that return
>>> PTransform<…, PCollection>.
>>> > In this case, we keep backwards compatibility and just add new
>>> funtionality. Though, we need to follow the same pattern for user API and
>>> maybe even naming for this feature across different IOs (like we have for
>>> "readAll()” methods).
>>> >
>>> >  I agree that we have to avoid returning PDone for such cases.
>>> >
>>> > On 24 Mar 2021, at 20:05, Robert Bradshaw  wrote:
>>> >
>>> > Returning PDone is an anti-pattern that should be avoided, but
>>> changing it now would be backwards incompatible. PRs to add non-PDone
>>> returning variants (probably as another option to the builders) that
>>> compose well with Wait, etc. would be welcome.
>>> >
>>> > On Wed, Mar 24, 2021 at 11:14 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>> >>
>>> >> In this way, I think “Wait” PTransform should work for you but, as it
>>> was mentioned before, it doesn’t work with PDone, only with PCollection as
>>> a signal.
>>> >>
>>> >> Since you already adjusted your own writer for that, it would be
>>> great to contribute it back to Beam in the way as it was done for other IOs
>>> (for example, JdbcIO [1] or BigtableIO [2])
>>> >>
>>> >> In general, I think we need to have it for all IOs, at least to use
>>> with “Wait” because this pattern it's quite often required.
>>> >>
>>> >> [1]
>>> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L1078
>>> >> [2]
>>> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java#L715
>>> >>
>>> >> On 24 Mar 2021, at 18:01, Vincent Marquez 
>>> wrote:
>>> >>
>>> >> No, it only needs to ensure that one record seen on Pubsub has
>>> successfully written to a database.  So "record by record" is fine, or even
>>> "bundle".
>>> >>
>>> >> ~Vincent
>>> >>
>>> >>
>>> >> On Wed,

Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Alex Amato

How about a PCollection containing every element which was successfully
written?
Basically the same things which were passed into it.

Then you could act on every element after its been successfully written to
the sink.

On Wed, Mar 24, 2021 at 3:16 PM Robert Bradshaw  wrote:

> On Wed, Mar 24, 2021 at 2:36 PM Ismaël Mejía  wrote:
>
>> +dev
>>
>> Since we all agree that we should return something different than
>> PDone the real question is what should we return.
>>
>
> My proposal is that one returns a PCollection that consists,
> internally, of something contentless like nulls. This is future compatible
> with returning something more maningful based on the source source or write
> process itself, but at least this would be followable.
>
>
>> As a reminder we had a pretty interesting discussion about this
>> already in the past but uniformization of our return values has not
>> happened.
>> This thread is worth reading for Vincent or anyone who wants to
>> contribute Write transforms that return.
>>
>> https://lists.apache.org/thread.html/d1a4556a1e13a661cce19021926a5d0997fbbfde016d36989cf75a07%40%3Cdev.beam.apache.org%3E
>
>
> Yeah, we should go ahead and finally do something.
>
>
>>
>> > Returning PDone is an anti-pattern that should be avoided, but changing
>> it now would be backwards incompatible.
>>
>> Periodic reminder most IOs are still Experimental so I suppose it is
>> worth to the maintainers to judge if the upgrade to return someething
>> different of PDone is worth, in that case we can deprecate and remove
>> the previous signature in short time (2 releases was the average for
>> previous cases).
>>
>>
>> On Wed, Mar 24, 2021 at 10:24 PM Alexey Romanenko
>>  wrote:
>> >
>> > I thought that was said about returning a PCollection of write results
>> as it’s done in other IOs (as I mentioned as examples) that have
>> _additional_ write methods, like “withWriteResults()” etc, that return
>> PTransform<…, PCollection>.
>> > In this case, we keep backwards compatibility and just add new
>> funtionality. Though, we need to follow the same pattern for user API and
>> maybe even naming for this feature across different IOs (like we have for
>> "readAll()” methods).
>> >
>> >  I agree that we have to avoid returning PDone for such cases.
>> >
>> > On 24 Mar 2021, at 20:05, Robert Bradshaw  wrote:
>> >
>> > Returning PDone is an anti-pattern that should be avoided, but changing
>> it now would be backwards incompatible. PRs to add non-PDone returning
>> variants (probably as another option to the builders) that compose well
>> with Wait, etc. would be welcome.
>> >
>> > On Wed, Mar 24, 2021 at 11:14 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>> >>
>> >> In this way, I think “Wait” PTransform should work for you but, as it
>> was mentioned before, it doesn’t work with PDone, only with PCollection as
>> a signal.
>> >>
>> >> Since you already adjusted your own writer for that, it would be great
>> to contribute it back to Beam in the way as it was done for other IOs (for
>> example, JdbcIO [1] or BigtableIO [2])
>> >>
>> >> In general, I think we need to have it for all IOs, at least to use
>> with “Wait” because this pattern it's quite often required.
>> >>
>> >> [1]
>> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L1078
>> >> [2]
>> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java#L715
>> >>
>> >> On 24 Mar 2021, at 18:01, Vincent Marquez 
>> wrote:
>> >>
>> >> No, it only needs to ensure that one record seen on Pubsub has
>> successfully written to a database.  So "record by record" is fine, or even
>> "bundle".
>> >>
>> >> ~Vincent
>> >>
>> >>
>> >> On Wed, Mar 24, 2021 at 9:49 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>> >>>
>> >>> Do you want to wait for ALL records are written for Cassandra and
>> then write all successfully written records to PubSub or it should be
>> performed "record by record"?
>> >>>
>> >>> On 24 Mar 2021, at 04:58, Vincent Marquez 
>> wrote:
>> >>>
>> >>> I have a common use case where my pipeline looks like this:
>> >>> CassandraIO.readAll -> Aggregate -> CassandraIO.write ->
>> PubSubIO.write
>> >>>
>> >>> I do NOT want my pipeline to look like the following:
>> >>>
>> >>> CassandraIO.readAll -> Aggregate -> CassandraIO.write
>> >>>  |
>> >>>   ->
>> PubsubIO.write
>> >>>
>> >>> Because I need to ensure that only items written to Pubsub have
>> successfully finished a (quorum) write.
>> >>>
>> >>> Since CassandraIO.write is a PTransform I can't actually
>> use it here so I often roll my own 'writer', but maybe there is a
>> recommended way of doing this?
>> >>>
>> >>> Thanks in advance for any help.
>>

Flaky test issue report

2021-03-24 Thread Beam Jira Bot

This is your daily summary of Beam's current flaky tests. These are P1 issues 
because they have a major negative impact on the community and make it hard to 
determine the quality of the software.

BEAM-12020: :sdks:java:container:java8:docker failing missing licenses 
(https://issues.apache.org/jira/browse/BEAM-12020)
BEAM-12019: 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky (https://issues.apache.org/jira/browse/BEAM-12019)
BEAM-11792: Python precommit failed (flaked?) installing package  
(https://issues.apache.org/jira/browse/BEAM-11792)
BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (https://issues.apache.org/jira/browse/BEAM-11666)
BEAM-11661: hdfsIntegrationTest flake: network not found (py38 postcommit) 
(https://issues.apache.org/jira/browse/BEAM-11661)
BEAM-11646: beam_PostCommit_XVR_Spark failing 
(https://issues.apache.org/jira/browse/BEAM-11646)
BEAM-11645: beam_PostCommit_XVR_Flink failing 
(https://issues.apache.org/jira/browse/BEAM-11645)
BEAM-11540: Linter sometimes flakes on apache_beam.dataframe.frames_test 
(https://issues.apache.org/jira/browse/BEAM-11540)
BEAM-11493: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyAndWindows
 (https://issues.apache.org/jira/browse/BEAM-11493)
BEAM-11492: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMergingWindows
 (https://issues.apache.org/jira/browse/BEAM-11492)
BEAM-11491: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMultipleWindows
 (https://issues.apache.org/jira/browse/BEAM-11491)
BEAM-11490: Spark test failure: 
org.apache.beam.sdk.transforms.ReifyTimestampsTest.inValuesSucceeds 
(https://issues.apache.org/jira/browse/BEAM-11490)
BEAM-11489: Spark test failure: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-11489)
BEAM-11488: Spark test failure: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedCounterMetrics
 (https://issues.apache.org/jira/browse/BEAM-11488)
BEAM-11487: Spark test failure: 
org.apache.beam.sdk.transforms.WithTimestampsTest.withTimestampsShouldApplyTimestamps
 (https://issues.apache.org/jira/browse/BEAM-11487)
BEAM-11486: Spark test failure: 
org.apache.beam.sdk.testing.PAssertTest.testSerializablePredicate 
(https://issues.apache.org/jira/browse/BEAM-11486)
BEAM-11485: Spark test failure: 
org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineNullValues 
(https://issues.apache.org/jira/browse/BEAM-11485)
BEAM-11484: Spark test failure: 
org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics 
(https://issues.apache.org/jira/browse/BEAM-11484)
BEAM-11483: Spark PostCommit Test Improvements 
(https://issues.apache.org/jira/browse/BEAM-11483)
BEAM-10987: stager_test.py::StagerTest::test_with_main_session flaky on 
windows py3.6,3.7 (https://issues.apache.org/jira/browse/BEAM-10987)
BEAM-10968: flaky test: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-10968)
BEAM-10955: Flink Java Runner test flake: Could not find Flink job  
(https://issues.apache.org/jira/browse/BEAM-10955)
BEAM-10923: Python requirements installation in docker container is flaky 
(https://issues.apache.org/jira/browse/BEAM-10923)
BEAM-10901: Flaky test: 
PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection
 (https://issues.apache.org/jira/browse/BEAM-10901)
BEAM-10866: PortableRunnerTestWithSubprocesses.test_register_finalizations 
flaky on macOS (https://issues.apache.org/jira/browse/BEAM-10866)
BEAM-10763: Spotless flake (NullPointerException) 
(https://issues.apache.org/jira/browse/BEAM-10763)
BEAM-10590: BigQueryQueryToTableIT flaky: test_big_query_new_types 
(https://issues.apache.org/jira/browse/BEAM-10590)
BEAM-10589: Samza ValidatesRunner failure: 
testParDoWithSideInputsIsCumulative 
(https://issues.apache.org/jira/browse/BEAM-10589)
BEAM-10519: 
MultipleInputsAndOutputTests.testParDoWithSideInputsIsCumulative flaky on Samza 
(https://issues.apache.org/jira/browse/BEAM-10519)
BEAM-10504: Failure / flake in ElasticSearchIOTest > 
testWriteFullAddressing and testWriteWithIndexFn 
(https://issues.apache.org/jira/browse/BEAM-10504)
BEAM-10501: CheckGrafanaStalenessAlerts and PingGrafanaHttpApi fail with 
Connection refused (https://issues.apache.org/jira/browse/BEAM-10501)
BEAM-10485: Failure / flake: ElasticsearchIOTest > testWriteWithIndexFn 
(https://issues.apache.org/jira/browse/BEAM-10485)
BEAM-10272: Failure in CassandraIOTest init: cannot create cluster due to 
netty link error

P1 issues report

2021-03-24 Thread Beam Jira Bot

This is your daily summary of Beam's current P1 issues, not including flaky 
tests.

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

BEAM-12041: Project beam_PostCommit_Java_ValidatesRunner_ULR is failing 
(https://issues.apache.org/jira/browse/BEAM-12041)
BEAM-12021: PubsubReadIT failures: "Cannot nackAll on persisting 
checkpoint" (https://issues.apache.org/jira/browse/BEAM-12021)
BEAM-12000: Support Python 3.9 in Apache Beam 
(https://issues.apache.org/jira/browse/BEAM-12000)
BEAM-11961: InfluxDBIOIT failing with unauthorized error 
(https://issues.apache.org/jira/browse/BEAM-11961)
BEAM-11959: Python Beam SDK Harness hangs when install pip packages 
(https://issues.apache.org/jira/browse/BEAM-11959)
BEAM-11922: 
org.apache.beam.examples.cookbook.MapClassIntegrationIT.testDataflowMapState 
has been failing in master (https://issues.apache.org/jira/browse/BEAM-11922)
BEAM-11886: MapState and SetState failing tests on Dataflow streaming 
(https://issues.apache.org/jira/browse/BEAM-11886)
BEAM-11862: Write To Kafka does not work 
(https://issues.apache.org/jira/browse/BEAM-11862)
BEAM-11828: JmsIO is not acknowledging messages correctly 
(https://issues.apache.org/jira/browse/BEAM-11828)
BEAM-11815: Fail to read more than 1M of items 
(https://issues.apache.org/jira/browse/BEAM-11815)
BEAM-11772: GCP BigQuery sink (file loads) uses runner determined sharding 
for unbounded data (https://issues.apache.org/jira/browse/BEAM-11772)
BEAM-11755: Cross-language consistency (RequiresStableInputs) is quietly 
broken (at least on portable flink runner) 
(https://issues.apache.org/jira/browse/BEAM-11755)
BEAM-11578: `dataflow_metrics` (python) fails with TypeError (when int 
overflowing?) (https://issues.apache.org/jira/browse/BEAM-11578)
BEAM-11434: Expose Spanner admin/batch clients in Spanner Accessor 
(https://issues.apache.org/jira/browse/BEAM-11434)
BEAM-11227: Upgrade beam-vendor-grpc-1_26_0-0.3 to fix CVE-2020-27216 
(https://issues.apache.org/jira/browse/BEAM-11227)
BEAM-11148: Kafka commitOffsetsInFinalize OOM on Flink 
(https://issues.apache.org/jira/browse/BEAM-11148)
BEAM-11017: Timer with dataflow runner can be set multiple times (dataflow 
runner) (https://issues.apache.org/jira/browse/BEAM-11017)
BEAM-10883: XmlIO parsing of multibyte characters 
(https://issues.apache.org/jira/browse/BEAM-10883)
BEAM-10861: Adds URNs and payloads to PubSub transforms 
(https://issues.apache.org/jira/browse/BEAM-10861)
BEAM-10663: CrossLanguageKafkaIOTest broken on Flink Runner 
(https://issues.apache.org/jira/browse/BEAM-10663)
BEAM-10573: CSV files are loaded several times if they are too large 
(https://issues.apache.org/jira/browse/BEAM-10573)
BEAM-10569: SpannerIO tests don't actually assert anything. 
(https://issues.apache.org/jira/browse/BEAM-10569)
BEAM-10288: Quickstart documents are out of date 
(https://issues.apache.org/jira/browse/BEAM-10288)
BEAM-10244: Populate requirements cache fails on poetry-based packages 
(https://issues.apache.org/jira/browse/BEAM-10244)
BEAM-10100: FileIO writeDynamic with AvroIO.sink not writing all data 
(https://issues.apache.org/jira/browse/BEAM-10100)
BEAM-9917: BigQueryBatchFileLoads dynamic destination 
(https://issues.apache.org/jira/browse/BEAM-9917)
BEAM-9564: Remove insecure ssl options from MongoDBIO 
(https://issues.apache.org/jira/browse/BEAM-9564)
BEAM-9455: Environment-sensitive provisioning for Dataflow 
(https://issues.apache.org/jira/browse/BEAM-9455)
BEAM-9154: Move Chicago Taxi Example to Python 3 
(https://issues.apache.org/jira/browse/BEAM-9154)
BEAM-8407: [SQL] Some Hive tests throw NullPointerException, but get marked 
as passing (Direct Runner) (https://issues.apache.org/jira/browse/BEAM-8407)
BEAM-7717: PubsubIO watermark tracking hovers near start of epoch 
(https://issues.apache.org/jira/browse/BEAM-7717)
BEAM-7716: PubsubIO returns empty message bodies for all messages read 
(https://issues.apache.org/jira/browse/BEAM-7716)
BEAM-7195: BigQuery - 404 errors for 'table not found' when using dynamic 
destinations - sometimes, new table fails to get created 
(https://issues.apache.org/jira/browse/BEAM-7195)
BEAM-7064: Conversion of timestamp from BigQuery row to Beam row loses 
precision (https://issues.apache.org/jira/browse/BEAM-7064)
BEAM-6839: User reports protobuf ClassChangeError running against 2.6.0 or 
above (https://issues.apache.org/jira/browse/BEAM-6839)
BEAM-6466: KafkaIO doesn't commit offsets while being used as bounded 
source (https://issues.apache.org/jira/browse/BEAM-6466)
BEAM-5997: EVENT_TIME timer throws exception when side input used 
(https://issues.apache.org/jira/browse/BEAM-5997)
BEAM-5305: Timeout handling in JdbcIO with Oracle java driver 
(https://issues.apache.org/jira/browse/BEAM-5305)

Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Robert Bradshaw

On Wed, Mar 24, 2021 at 2:36 PM Ismaël Mejía  wrote:

> +dev
>
> Since we all agree that we should return something different than
> PDone the real question is what should we return.
>

My proposal is that one returns a PCollection that consists, internally,
of something contentless like nulls. This is future compatible
with returning something more maningful based on the source source or write
process itself, but at least this would be followable.


> As a reminder we had a pretty interesting discussion about this
> already in the past but uniformization of our return values has not
> happened.
> This thread is worth reading for Vincent or anyone who wants to
> contribute Write transforms that return.
>
> https://lists.apache.org/thread.html/d1a4556a1e13a661cce19021926a5d0997fbbfde016d36989cf75a07%40%3Cdev.beam.apache.org%3E


Yeah, we should go ahead and finally do something.


>
> > Returning PDone is an anti-pattern that should be avoided, but changing
> it now would be backwards incompatible.
>
> Periodic reminder most IOs are still Experimental so I suppose it is
> worth to the maintainers to judge if the upgrade to return someething
> different of PDone is worth, in that case we can deprecate and remove
> the previous signature in short time (2 releases was the average for
> previous cases).
>
>
> On Wed, Mar 24, 2021 at 10:24 PM Alexey Romanenko
>  wrote:
> >
> > I thought that was said about returning a PCollection of write results
> as it’s done in other IOs (as I mentioned as examples) that have
> _additional_ write methods, like “withWriteResults()” etc, that return
> PTransform<…, PCollection>.
> > In this case, we keep backwards compatibility and just add new
> funtionality. Though, we need to follow the same pattern for user API and
> maybe even naming for this feature across different IOs (like we have for
> "readAll()” methods).
> >
> >  I agree that we have to avoid returning PDone for such cases.
> >
> > On 24 Mar 2021, at 20:05, Robert Bradshaw  wrote:
> >
> > Returning PDone is an anti-pattern that should be avoided, but changing
> it now would be backwards incompatible. PRs to add non-PDone returning
> variants (probably as another option to the builders) that compose well
> with Wait, etc. would be welcome.
> >
> > On Wed, Mar 24, 2021 at 11:14 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
> >>
> >> In this way, I think “Wait” PTransform should work for you but, as it
> was mentioned before, it doesn’t work with PDone, only with PCollection as
> a signal.
> >>
> >> Since you already adjusted your own writer for that, it would be great
> to contribute it back to Beam in the way as it was done for other IOs (for
> example, JdbcIO [1] or BigtableIO [2])
> >>
> >> In general, I think we need to have it for all IOs, at least to use
> with “Wait” because this pattern it's quite often required.
> >>
> >> [1]
> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L1078
> >> [2]
> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java#L715
> >>
> >> On 24 Mar 2021, at 18:01, Vincent Marquez 
> wrote:
> >>
> >> No, it only needs to ensure that one record seen on Pubsub has
> successfully written to a database.  So "record by record" is fine, or even
> "bundle".
> >>
> >> ~Vincent
> >>
> >>
> >> On Wed, Mar 24, 2021 at 9:49 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
> >>>
> >>> Do you want to wait for ALL records are written for Cassandra and then
> write all successfully written records to PubSub or it should be performed
> "record by record"?
> >>>
> >>> On 24 Mar 2021, at 04:58, Vincent Marquez 
> wrote:
> >>>
> >>> I have a common use case where my pipeline looks like this:
> >>> CassandraIO.readAll -> Aggregate -> CassandraIO.write -> PubSubIO.write
> >>>
> >>> I do NOT want my pipeline to look like the following:
> >>>
> >>> CassandraIO.readAll -> Aggregate -> CassandraIO.write
> >>>  |
> >>>   ->
> PubsubIO.write
> >>>
> >>> Because I need to ensure that only items written to Pubsub have
> successfully finished a (quorum) write.
> >>>
> >>> Since CassandraIO.write is a PTransform I can't actually use
> it here so I often roll my own 'writer', but maybe there is a recommended
> way of doing this?
> >>>
> >>> Thanks in advance for any help.
> >>>
> >>> ~Vincent
> >>>
> >>>
> >>
> >
>

Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Ismaël Mejía

+dev

Since we all agree that we should return something different than
PDone the real question is what should we return.
As a reminder we had a pretty interesting discussion about this
already in the past but uniformization of our return values has not
happened.
This thread is worth reading for Vincent or anyone who wants to
contribute Write transforms that return.
https://lists.apache.org/thread.html/d1a4556a1e13a661cce19021926a5d0997fbbfde016d36989cf75a07%40%3Cdev.beam.apache.org%3E

> Returning PDone is an anti-pattern that should be avoided, but changing it 
> now would be backwards incompatible.

Periodic reminder most IOs are still Experimental so I suppose it is
worth to the maintainers to judge if the upgrade to return someething
different of PDone is worth, in that case we can deprecate and remove
the previous signature in short time (2 releases was the average for
previous cases).


On Wed, Mar 24, 2021 at 10:24 PM Alexey Romanenko
 wrote:
>
> I thought that was said about returning a PCollection of write results as 
> it’s done in other IOs (as I mentioned as examples) that have _additional_ 
> write methods, like “withWriteResults()” etc, that return PTransform<…, 
> PCollection>.
> In this case, we keep backwards compatibility and just add new funtionality. 
> Though, we need to follow the same pattern for user API and maybe even naming 
> for this feature across different IOs (like we have for "readAll()” methods).
>
>  I agree that we have to avoid returning PDone for such cases.
>
> On 24 Mar 2021, at 20:05, Robert Bradshaw  wrote:
>
> Returning PDone is an anti-pattern that should be avoided, but changing it 
> now would be backwards incompatible. PRs to add non-PDone returning variants 
> (probably as another option to the builders) that compose well with Wait, 
> etc. would be welcome.
>
> On Wed, Mar 24, 2021 at 11:14 AM Alexey Romanenko  
> wrote:
>>
>> In this way, I think “Wait” PTransform should work for you but, as it was 
>> mentioned before, it doesn’t work with PDone, only with PCollection as a 
>> signal.
>>
>> Since you already adjusted your own writer for that, it would be great to 
>> contribute it back to Beam in the way as it was done for other IOs (for 
>> example, JdbcIO [1] or BigtableIO [2])
>>
>> In general, I think we need to have it for all IOs, at least to use with 
>> “Wait” because this pattern it's quite often required.
>>
>> [1] 
>> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L1078
>> [2] 
>> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java#L715
>>
>> On 24 Mar 2021, at 18:01, Vincent Marquez  wrote:
>>
>> No, it only needs to ensure that one record seen on Pubsub has successfully 
>> written to a database.  So "record by record" is fine, or even "bundle".
>>
>> ~Vincent
>>
>>
>> On Wed, Mar 24, 2021 at 9:49 AM Alexey Romanenko  
>> wrote:
>>>
>>> Do you want to wait for ALL records are written for Cassandra and then 
>>> write all successfully written records to PubSub or it should be performed 
>>> "record by record"?
>>>
>>> On 24 Mar 2021, at 04:58, Vincent Marquez  wrote:
>>>
>>> I have a common use case where my pipeline looks like this:
>>> CassandraIO.readAll -> Aggregate -> CassandraIO.write -> PubSubIO.write
>>>
>>> I do NOT want my pipeline to look like the following:
>>>
>>> CassandraIO.readAll -> Aggregate -> CassandraIO.write
>>>  |
>>>   -> PubsubIO.write
>>>
>>> Because I need to ensure that only items written to Pubsub have 
>>> successfully finished a (quorum) write.
>>>
>>> Since CassandraIO.write is a PTransform I can't actually use it 
>>> here so I often roll my own 'writer', but maybe there is a recommended way 
>>> of doing this?
>>>
>>> Thanks in advance for any help.
>>>
>>> ~Vincent
>>>
>>>
>>
>

BEAM-9185 Publish pre-release python artifacts (RCs) to PyPI

2021-03-24 Thread Josias Rico García

Hello devs


Me and @Benjamin Gonzalez Delgado  have
been working on the solution of Jira Issue BEAM-9185
, and we’ve done the
following:

Background:

When executing choose_rc_commit.sh, a commit tag is created, and the
automated Github actions “build_wheels” detect the name of the tag that
contains the *rc* version and generates both RC and non-RC artifacts.
Release Manager can execute build_release_candidate.sh as usual.

Modification:

At the end of stage 7. Build a release candidate:

   1.

Added step Upload release candidate to PyPI to execute
   deploy_release_candidate_pypi.sh which:
   1.

  Downloads source distribution and wheels tagged as *rc*.
  2.

  Deploys release candidate to PyPI
  1.

 These artifacts will be uploaded to PyPI as a pre-release.
 2.

 A link will be added as part of the voting.
 2.

   Updated *Release Guide*.



Important Notes:



   -

According to the Publishing pre-release artifacts to repositories
   

   discussion thread, this snapshot is not targeted to users.
   -

   The voting process will remain the same: over the binaries in svn and not
   RC.
   -

   Use the RC binaries in PyPi as needed: download them with pip install
   --pre.
   -

   It is not necessary to rebuild after the voting.


Please let us know any comments or doubts. You can find the PR with the
complete changes here: https://github.com/apache/beam/pull/14325.

We are pleased to hear from you.


-- 

Josias Misael Rico Garcia | WIZELINE

Technical Writer

josias.r...@wizeline.com

Follow us Twitter  | Facebook
 | Instagram
 | LinkedIn


Share feedback on Clutch 

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*

Re: Upgrading vendored gRPC from 1.26.0 to 1.36.0

2021-03-24 Thread Tomo Suzuki

Update: I observe that Java precommit check is unstable in the PR to
upgrade vendored gRPC (compared with an PR with an empty change). There's
no constant failures; sometimes it succeeds and other times it faces
timeout and flaky test failures.

https://github.com/apache/beam/pull/14295#issuecomment-806071087


On Mon, Mar 22, 2021 at 10:46 AM Tomo Suzuki  wrote:

> Thank you for the voting and I see the artifact available in Maven
> Central. I'll work on the PR to use the published artifact today.
>
> https://search.maven.org/artifact/org.apache.beam/beam-vendor-grpc-1_36_0/0.1/jar
>
> On Tue, Mar 16, 2021 at 3:07 PM Kenneth Knowles  wrote:
>
>> Update on this: there are some minor issues and then I'll send out the RC.
>>
>> I think this is worth blocking 2.29.0 release on, so I will do this
>> first. We are still eliminating other blockers from 2.29.0 anyhow.
>>
>> Kenn
>>
>> On Mon, Mar 15, 2021 at 7:17 AM Tomo Suzuki  wrote:
>>
>>> Hi Beam developers,
>>>
>>> I'm working on upgrading the vendored gRPC 1.36.0
>>> https://issues.apache.org/jira/browse/BEAM-11227 (PR:
>>> https://github.com/apache/beam/pull/14028)
>>> Let me know if you have any questions or concerns.
>>>
>>> Background:
>>> Exchanged messages with Ismaël in BEAM-11227, it seems that it the
>>> ticket created by some automation is false positive, but it's nice to use
>>> an artifact without being marked with CVE.
>>>
>>> Kenn offered to work as the release manager (as in
>>> https://s.apache.org/beam-release-vendored-artifacts) of the vendored
>>> artifact.
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>
>
> --
> Regards,
> Tomo
>


-- 
Regards,
Tomo

Re: Make KafkaIO usable from Dataflow Template?

2021-03-24 Thread Alexey Romanenko

Robert, could you elaborate a bit why or point me out if it was already 
discussed?

> On 24 Mar 2021, at 00:11, Robert Bradshaw  wrote:
> 
> I would encourage flex templates over further proliferation of 
> ValueProviders. 
> 
> On Tue, Mar 23, 2021 at 12:42 PM Alexey Romanenko  > wrote:
> I think you are right - now with SDF support in KafkaIO it should be possible 
> to determine the number of splits in the runtime and support ValueProviders 
> in that way.
> 
> CC: Boyuan Zhang
> 
>> On 23 Mar 2021, at 18:18, Vincent Marquez > > wrote:
>> 
>> Hello.  I was looking at how to use the KafkaIO from a Dataflow Template, 
>> which requires all configuration options to be ValueProviders, which KafkaIO 
>> doesn't support currently.  I saw this old PR:
>> https://github.com/apache/beam/pull/6636/files 
>> 
>> 
>> I believe the reason this was never merged was there wasn't a good way to 
>> determine the number of splits to fire up in the UnboundedSource for 
>> KafkaIO.  However, now that KafkaIO is based on SplittableDoFn which handles 
>> splitting in a dynamic fashion, are there still any blockers to this?  
>> 
>> Could we change all the KafkaIO parameters to ValueProviders now? 
>> 
>> ~Vincent
>

Re: BEAM-3304 - Go trigger question

2021-03-24 Thread Robert Burke

Thanks for your patience, my email client is presently a mess.

In this case, the root issue is the configuration of your development set
up for Go. It's running the code, but depending on packages in the GO_PATH
or GO_ROOT, not your github client.

To solve this, follow the instructions at
https://github.com/apache/beam/blob/master/sdks/go/README.md#contributing-to-the-go-sdk
which explains the correct nesting. The short version is that your git
client needs to be *inside* the GO_PATH.

This has certainly become more inconvenient Since Go 1.16 has release last
month, as one needs to set GO111MODULES=auto (see
https://blog.golang.org/go116-module-changes for details), as it's been set
to on. If you're using an older version of Go (at least, I know as far back
as Go 1.12 works as that's what the automated tests use.), that shouldn't
be an issue.

The Go SDK is not currently 100% module compliant, but that is being worked
on in Q2.  I'm less certain about how that would change development though.
Would probably require some local overrides in the go.mod file... I digress.

 https://cwiki.apache.org/confluence/display/BEAM/Go+Tips has further tips
for setting up and developing the SDK itself, and testing.

Robert B

On Mon, 22 Mar 2021 at 16:11, César Cueva Orozco 
wrote:

> Hello Team,
>
> We are trying to test the following behavior by replacing the *default*
> trigger in WindowingStrategy:
>
> https://github.com/apache/beam/blob/3b2b07d251f58ec00a0056ac368b443fea2f10d9/sdks/go/pkg/beam/core/runtime/graphx/translate.go
>
> Before:
> ws := {
> WindowFn: windowFn,
> MergeStatus: pipepb.MergeStatus_NON_MERGING,
> AccumulationMode: pipepb.AccumulationMode_DISCARDING,
> WindowCoderId: windowCoderId,
> Trigger: {
> *Trigger: _Default_{*
> * Default: _Default{},*
> * },*
> },
>
> After:
> ws := {
> WindowFn: windowFn,
> MergeStatus: pipepb.MergeStatus_NON_MERGING,
> AccumulationMode: pipepb.AccumulationMode_DISCARDING,
> WindowCoderId: windowCoderId,
> Trigger: {
> *Trigger: _Always_{*
> * Always: _Always{},*
> * },*
> },
>
> Despite having changed the default code, the behavior doesn't change. See
> the below console output:
>
> windowing_strategies:
> key: "w0"
> value: <
>   window_fn: <
> urn: "beam:window_fn:global_windows:v1"
>   >
>   merge_status: NON_MERGING
>   window_coder_id: "c4"
>
> *trigger:  >
>   >
>   accumulation_mode: DISCARDING
>   output_time: END_OF_WINDOW
>   closing_behavior: EMIT_IF_NONEMPTY
>   OnTimeBehavior: FIRE_ALWAYS
>   environment_id: "go"
>
> This is just to check if the runner acknowledges the change before we
> implement different triggers.
>
> Please let us know if we are missing something.
>
> Thank you,
> -Cesar
>
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*

Re: Missing copyright notices due to LICENSE change

2021-03-24 Thread Robert Burke

I'm less concerned about the Go Doc at this point.

0. The Go SDK is still experimental (I know I know, almost there), making
it less severe of an issue.
1 The interpretation pkg.go.dev does is live with the release.
1.a That is, if they start accepting the python 2.0 license blob that's
fixed for all affected versions.
1.b Further, if we do cherrypick licence textual fixes into the older
versions for the licence matching, then the site will pick those up again.
2. Users can run the godoc tool
 themselves to get package
documentation if they want the docs of the version they're using, or simply
have the programming language server (for go, GoPLS) lift that into their
IDEs as needed.


On Mon, 22 Mar 2021 at 16:42, Ahmet Altay  wrote:

> For visibility: Change is cherry picked to 2.29.0 release branch (
> https://github.com/apache/beam/pull/14304).
>
> On Mon, Mar 22, 2021 at 12:37 PM Kenneth Knowles  wrote:
>
>> So if I understand correctly, the options for a correct license in the
>> released artifacts are:
>>
>>  - revert the change
>>  - build some automation for bundled jars
>>  - do something manual?
>>
>> Kenn
>>
>> On Mon, Mar 22, 2021 at 10:57 AM Brian Hulette 
>> wrote:
>>
>>> Pros/cons for a cherrypick:
>>> (+) Fixes regression for licenses in released Java artifacts.
>>> (-) It's possible it will permanently break docs on pkg.go.dev for
>>> 2.29.0 if https://github.com/golang/go/issues/45095 requires changes on
>>> our end (e.g. fixing the PSF License text).
>>>
>>> My sense is the pro outweighs the con here, but I could be convinced
>>> otherwise. I guess that makes me +0 for cherrypick.
>>>
>>> Brian
>>>
>>> On Mon, Mar 22, 2021 at 10:43 AM Ahmet Altay  wrote:
>>>


 On Mon, Mar 22, 2021 at 10:31 AM Kenneth Knowles 
 wrote:

> Is there a Jira marked as blocking 2.29.0 for the cherrypick?
>

 I do not think so. I have not filed a jira or started a cherry pick pr.

 Sorry, I was not sure if we agreed to cherry pick or not. Do you want
 me to do that?


>
> On Fri, Mar 19, 2021 at 6:16 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> I also noticed (with a help of an automated tool) that
>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/resources/NOTICES
>> includes additional licenses not included in
>> https://github.com/apache/beam/blob/master/LICENSE. Is that WAI
>> since Dataflow runner is released as a separate jar artifact, and the
>> licenses in question (GPL 2.0, CDDL) pertain to its dependencies, or we
>> need to include those licenses as well?
>>
>>
>>
>>
>> On Thu, Mar 18, 2021 at 9:51 AM Ahmet Altay  wrote:
>>
>>>
>>>
>>> On Thu, Mar 18, 2021 at 6:39 AM Brian Hulette 
>>> wrote:
>>>
 Thanks Robert! I'm +1 for reverting and engaging pkg.go.dev

 > and probably cherry pick it into the affected release branches.
 Even if we do this, the Java artifacts from the affected releases
 are missing the additional LICENSE text.

>>>
>>> IMO we can skip the cherry picks perhaps with the exception of the
>>> upcoming 2.29 release.
>>>

 > I do not know how to interpret this ASF guide. As an example from
 another project: airflow also has a LICENSE file, NOTICE file, and a
 licenses directory. There are even overlapping mentions.
 Agreed. I am a software engineer, not a lawyer, and even the ASF's
 guide that presumably targets engineers is not particularly clear to 
 me.
 This was just my tenuous understanding after a quick review.

>>>
>>> Agreed. We can ask LEGAL for further clarification.
>>>
>>>

 On Wed, Mar 17, 2021 at 7:49 PM Ahmet Altay 
 wrote:

> Thank you Rebo. I agree with reverting first and then figure out
> the next steps.
>
> Here is a PR to revert your change:
> https://github.com/apache/beam/pull/14267
>
> On Wed, Mar 17, 2021 at 4:02 PM Robert Burke 
> wrote:
>
>> Looking at the history it seems that before the python text was
>> added, pkg.go.dev can parse the license stack just fine. It
>> doesn't recognize the PSF license, and fails closed entirely as a 
>> result.
>>
>> I've filed an issue with pkg.go.dev (
>> https://github.com/golang/go/issues/45095). If the bug is fixed,
>> the affected versions will become visible as well.
>>
>> In the meantime, we should revert my change which clobbered the
>> other licenses and probably cherry pick it into the affected release
>> branches.
>>
>> The PSF license is annoying as it's explicitly unique. Nothing
>> but python

Re: [ANNOUNCE] New committer: Ning Kang

2021-03-24 Thread Austin Bennett

Thanks, Ning!

On Wed, Mar 24, 2021 at 5:51 AM Reza Rokni  wrote:

> Congratulations!
>
> On Wed, Mar 24, 2021 at 12:53 PM Kenneth Knowles  wrote:
>
>> Congratulations and thanks for all your contributions, Ning!
>>
>> Kenn
>>
>> On Tue, Mar 23, 2021 at 4:10 PM Valentyn Tymofieiev 
>> wrote:
>>
>>> Congratulations, Ning, very well deserved!
>>>
>>> On Tue, Mar 23, 2021 at 2:01 PM Chamikara Jayalath 
>>> wrote:
>>>
 Congrats Ning!

 On Tue, Mar 23, 2021 at 1:23 PM Rui Wang  wrote:

> Congrats!
>
>
>
> -Rui
>
> On Tue, Mar 23, 2021 at 1:05 PM Yichi Zhang  wrote:
>
>> Congratulations Ning!
>>
>> On Tue, Mar 23, 2021 at 1:00 PM Robin Qiu  wrote:
>>
>>> Congratulations Ning!
>>>
>>> On Tue, Mar 23, 2021 at 12:56 PM Ahmet Altay 
>>> wrote:
>>>
 Congratulations Ning!

 On Tue, Mar 23, 2021 at 12:38 PM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Congrats, Ning Kang! Well deserved!
> Thank you for your contributions and users support!
>
> Alexey
>
> On 23 Mar 2021, at 20:35, Pablo Estrada 
> wrote:
>
> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Ning Kang.
>
> Ning has been working in Beam for a while. He has contributed to
> the interactive experience of the Pyhton SDK, and developed a sidebar
> component, along with a release process for it. Ning has also helped 
> users
> on StackOverflow and user@, especially when it comes to
> Interactive Beam.
>
> Considering these contributions, the Beam PMC trusts Ning with the
> responsibilities of a Beam committer.[1]
>
> Thanks Ning!
> -P.
>
> [1] https://beam.apache.org/contribute/become-a-committer
> /#an-apache-beam-committer
>
>
>

Re: [ANNOUNCE] New committer: Ning Kang

2021-03-24 Thread Reza Rokni

Congratulations!

On Wed, Mar 24, 2021 at 12:53 PM Kenneth Knowles  wrote:

> Congratulations and thanks for all your contributions, Ning!
>
> Kenn
>
> On Tue, Mar 23, 2021 at 4:10 PM Valentyn Tymofieiev 
> wrote:
>
>> Congratulations, Ning, very well deserved!
>>
>> On Tue, Mar 23, 2021 at 2:01 PM Chamikara Jayalath 
>> wrote:
>>
>>> Congrats Ning!
>>>
>>> On Tue, Mar 23, 2021 at 1:23 PM Rui Wang  wrote:
>>>
 Congrats!



 -Rui

 On Tue, Mar 23, 2021 at 1:05 PM Yichi Zhang  wrote:

> Congratulations Ning!
>
> On Tue, Mar 23, 2021 at 1:00 PM Robin Qiu  wrote:
>
>> Congratulations Ning!
>>
>> On Tue, Mar 23, 2021 at 12:56 PM Ahmet Altay 
>> wrote:
>>
>>> Congratulations Ning!
>>>
>>> On Tue, Mar 23, 2021 at 12:38 PM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 Congrats, Ning Kang! Well deserved!
 Thank you for your contributions and users support!

 Alexey

 On 23 Mar 2021, at 20:35, Pablo Estrada  wrote:

 Hi all,

 Please join me and the rest of the Beam PMC in welcoming a new
 committer: Ning Kang.

 Ning has been working in Beam for a while. He has contributed to
 the interactive experience of the Pyhton SDK, and developed a sidebar
 component, along with a release process for it. Ning has also helped 
 users
 on StackOverflow and user@, especially when it comes to
 Interactive Beam.

 Considering these contributions, the Beam PMC trusts Ning with the
 responsibilities of a Beam committer.[1]

 Thanks Ning!
 -P.

 [1] https://beam.apache.org/contribute/become-a-committer
 /#an-apache-beam-committer

Re: Make KafkaIO usable from Dataflow Template?

Re: Write to multiple IOs in linear fashion

Re: BEAM-9185 Publish pre-release python artifacts (RCs) to PyPI

Re: BEAM-9185 Publish pre-release python artifacts (RCs) to PyPI

Re: Write to multiple IOs in linear fashion

Re: BEAM-3713: Moving from nose to pytest

Re: Write to multiple IOs in linear fashion

Re: Write to multiple IOs in linear fashion

BEAM-3713: Moving from nose to pytest

BEAM-3304 - Go Trigger RuntimeError question

Re: Write to multiple IOs in linear fashion

Re: Write to multiple IOs in linear fashion

Flaky test issue report

P1 issues report

Re: Write to multiple IOs in linear fashion

Re: Write to multiple IOs in linear fashion

BEAM-9185 Publish pre-release python artifacts (RCs) to PyPI

Re: Upgrading vendored gRPC from 1.26.0 to 1.36.0

Re: Make KafkaIO usable from Dataflow Template?

Re: BEAM-3304 - Go trigger question

Re: Missing copyright notices due to LICENSE change

Re: [ANNOUNCE] New committer: Ning Kang

Re: [ANNOUNCE] New committer: Ning Kang

23 matches

Site Navigation

Mail list logo

Footer information