Running apache_beam python sdk without c/c++ libs

2020-06-10 Thread Noah Goodrich
I am looking at using the Beam Python SDK in AWS Glue but it doesn't
support non-native python libraries (anything that is c/c++ based).

Is the Beam Python SDK / runners able to be used without any c/c++ library
dependencies?


Re: Running apache_beam python sdk without c/c++ libs

2020-06-10 Thread Luke Cwik
Most runners are written in Java while others are cloud offerings which
wouldn't work for your use case which limits you to use the direct runner
(not meant for production/high performance applications). Beam Python SDK
uses cython for performance reasons but I don't believe it strictly
requires it as many unit tests run with and without cython enabled.
Integrations between Beam and third party libraries may require it though
so it likely depends on what you plan to do.

On Wed, Jun 10, 2020 at 8:17 AM Noah Goodrich  wrote:

> I am looking at using the Beam Python SDK in AWS Glue but it doesn't
> support non-native python libraries (anything that is c/c++ based).
>
> Is the Beam Python SDK / runners able to be used without any c/c++ library
> dependencies?
>


beam_PreCommit_Java_Phrase is hanging

2020-06-10 Thread Alexey Romanenko
Hello,

Seems like  “beam_PreCommit_Java_Phrase” is hanging. Last job run was triggered 
on 8th June [1] and new jobs can’t be started.
Could someone with a Jenkins "master power" take a look on this? 
Thanks!

https://builds.apache.org/job/beam_PreCommit_Java_Phrase/2303/ 


Re: Running apache_beam python sdk without c/c++ libs

2020-06-10 Thread Luke Cwik
I'm not sure. It depends on whether the Spark -> Beam Python integration
will interfere with the magic built into AWS Glue.

On Wed, Jun 10, 2020 at 8:57 AM Noah Goodrich  wrote:

> I was hoping to use the Spark runner since Glue is just Spark with some
> magic on top. And in our specific use case, we'd be looking at working with
> S3, Kinesis, and MySQL RDS.
>
> Sounds like this is a non-starter?
>
> On Wed, Jun 10, 2020 at 9:33 AM Luke Cwik  wrote:
>
>> Most runners are written in Java while others are cloud offerings which
>> wouldn't work for your use case which limits you to use the direct runner
>> (not meant for production/high performance applications). Beam Python SDK
>> uses cython for performance reasons but I don't believe it strictly
>> requires it as many unit tests run with and without cython enabled.
>> Integrations between Beam and third party libraries may require it though
>> so it likely depends on what you plan to do.
>>
>> On Wed, Jun 10, 2020 at 8:17 AM Noah Goodrich 
>> wrote:
>>
>>> I am looking at using the Beam Python SDK in AWS Glue but it doesn't
>>> support non-native python libraries (anything that is c/c++ based).
>>>
>>> Is the Beam Python SDK / runners able to be used without any c/c++
>>> library dependencies?
>>>
>>


[ANNOUNCE] Beam 2.22.0 Released

2020-06-10 Thread Brian Hulette
The Apache Beam team is pleased to announce the release of version 2.22.0.

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes bug fixes, features, and improvements detailed on
the Beam blog: https://beam.apache.org/blog/beam-2.22.0/

Thanks to everyone who contributed to this release, and we hope you enjoy
using Beam 2.22.0.
-- Brian Hulette, on behalf of The Apache Beam team


Re: DRAFT - Beam board report June 2020

2020-06-10 Thread Chamikara Jayalath
Added some updates related to IO connectors support. Thanks.

On Tue, Jun 9, 2020 at 9:04 PM Jean-Baptiste Onofre  wrote:

> Hi,
>
> It looks good with the latest proposed changes.
>
> Regards
> JB
>
> Le 9 juin 2020 à 20:36, Kenneth Knowles  a écrit :
>
> Ping! It is now June, and time to submit this report. Please add
> interesting tidbits from the last quarter. Perhaps find highlights in
> https://github.com/apache/beam/blob/master/CHANGES.md
>
> Kenn
>
> On Wed, Mar 25, 2020 at 10:40 AM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> I just finally got a chance to finish and submit the March board report
>> (late, sorry).
>>
>> I want to have the board report draft available earlier so we can make
>> notes whenever things happen. Just like CHANGES.md is for the code, this is
>> for the project/community.
>>
>> https://s.apache.org/beam-report-2020-06
>>
>> You can read past reports at
>> https://whimsy.apache.org/board/minutes/Beam.html to get a feel for it.
>> Here are some specific examples of things that are good to add:
>>
>>  - interesting technical discussions that steer the project
>>  - major integrations with other projects
>>  - community events
>>  - major user facing addition/deprecation (like the Flink and Python
>> version and LTS discussions)
>>
>> It is OK to add very rough data and not be too careful with language. I
>> will play editor and make it all fit together.
>>
>> Kenn
>>
>
>


Re: Remove EOL'd Runners

2020-06-10 Thread David Morávek
+1

On Tue, Jun 9, 2020 at 7:43 PM Ahmet Altay  wrote:

> Thank you Tyson!
>
> On Tue, Jun 9, 2020 at 10:20 AM Thomas Weise  wrote:
>
>> +1
>>
>>
>> On Tue, Jun 9, 2020 at 9:41 AM Robert Bradshaw 
>> wrote:
>>
>>> Makes sense to me.
>>>
>>> On Tue, Jun 9, 2020 at 8:45 AM Maximilian Michels 
>>> wrote:
>>>
 Thanks of the heads-up, Tyson! It's a sensible decision to remove
 unsupported runners.

 -Max

 On 09.06.20 16:51, Tyson Hamilton wrote:
 > Hi All,
 >
 > As part of the Fixit [1] I'd like to remove EOL'd runners, Apex and
 Gearpump, as described in BEAM- [2]. This will be a big PR I think and
 didn't want anyone to be surprised. There is already some agreement in the
 linked Jira issue. If there are no objections I'll get started later today
 or tomorrow.
 >
 > -Tyson
 >
 >
 > [1]:
 https://lists.apache.org/thread.html/r9ddc77a8fee58ad02f68e2d9a7f054aab3e55717cc88ad1d5bc49311%40%3Cdev.beam.apache.org%3E
 > [2]: https://issues.apache.org/jira/browse/BEAM-
 >

>>>


Question on NEXMark

2020-06-10 Thread Sruthi S Kumar
Hi,

We are working on a Flink project and enhancing some state backend
functionality. We are using NEXMark benchmark to compare different state
backends performance of Flink. While running NEXMark queries using Flink
runner of Beam we have noticed that there is quite a lot of non-existent
read from the state-backed.

For example, when running query 11 with RocksDB state-backed, we had around
368 successful reads while we had around 527 attempts to read non-existent
reads. We are curious if that is intentional and if so what's the rationale
behind it?


-- 
Regards,

Sruthi


Re: Remove EOL'd Runners

2020-06-10 Thread Kenneth Knowles
+1

All Jenkins configs are in the repo. There's a lag between merge and run of
the "seed job" that syncs our configs. We can do a manual run of it, or
just not worry about the temporary redness in the jobs that will be deleted
anyhow.

On Wed, Jun 10, 2020 at 8:57 AM Jan Lukavský  wrote:

> +1
> On 6/10/20 5:51 PM, David Morávek wrote:
>
> +1
>
> On Tue, Jun 9, 2020 at 7:43 PM Ahmet Altay  wrote:
>
>> Thank you Tyson!
>>
>> On Tue, Jun 9, 2020 at 10:20 AM Thomas Weise  wrote:
>>
>>> +1
>>>
>>>
>>> On Tue, Jun 9, 2020 at 9:41 AM Robert Bradshaw 
>>> wrote:
>>>
 Makes sense to me.

 On Tue, Jun 9, 2020 at 8:45 AM Maximilian Michels 
 wrote:

> Thanks of the heads-up, Tyson! It's a sensible decision to remove
> unsupported runners.
>
> -Max
>
> On 09.06.20 16:51, Tyson Hamilton wrote:
> > Hi All,
> >
> > As part of the Fixit [1] I'd like to remove EOL'd runners, Apex and
> Gearpump, as described in BEAM- [2]. This will be a big PR I think and
> didn't want anyone to be surprised. There is already some agreement in the
> linked Jira issue. If there are no objections I'll get started later today
> or tomorrow.
> >
> > -Tyson
> >
> >
> > [1]:
> https://lists.apache.org/thread.html/r9ddc77a8fee58ad02f68e2d9a7f054aab3e55717cc88ad1d5bc49311%40%3Cdev.beam.apache.org%3E
> > [2]: https://issues.apache.org/jira/browse/BEAM-
> >
>



Re: Remove EOL'd Runners

2020-06-10 Thread Jan Lukavský

+1

On 6/10/20 5:51 PM, David Morávek wrote:

+1

On Tue, Jun 9, 2020 at 7:43 PM Ahmet Altay > wrote:


Thank you Tyson!

On Tue, Jun 9, 2020 at 10:20 AM Thomas Weise mailto:t...@apache.org>> wrote:

+1


On Tue, Jun 9, 2020 at 9:41 AM Robert Bradshaw
mailto:rober...@google.com>> wrote:

Makes sense to me.

On Tue, Jun 9, 2020 at 8:45 AM Maximilian Michels
mailto:m...@apache.org>> wrote:

Thanks of the heads-up, Tyson! It's a sensible
decision to remove
unsupported runners.

-Max

On 09.06.20 16:51, Tyson Hamilton wrote:
> Hi All,
>
> As part of the Fixit [1] I'd like to remove EOL'd
runners, Apex and Gearpump, as described in BEAM-
[2]. This will be a big PR I think and didn't want
anyone to be surprised. There is already some
agreement in the linked Jira issue. If there are no
objections I'll get started later today or tomorrow.
>
> -Tyson
>
>
> [1]:

https://lists.apache.org/thread.html/r9ddc77a8fee58ad02f68e2d9a7f054aab3e55717cc88ad1d5bc49311%40%3Cdev.beam.apache.org%3E
> [2]: https://issues.apache.org/jira/browse/BEAM-
>



Re: Running apache_beam python sdk without c/c++ libs

2020-06-10 Thread Noah Goodrich
I was hoping to use the Spark runner since Glue is just Spark with some
magic on top. And in our specific use case, we'd be looking at working with
S3, Kinesis, and MySQL RDS.

Sounds like this is a non-starter?

On Wed, Jun 10, 2020 at 9:33 AM Luke Cwik  wrote:

> Most runners are written in Java while others are cloud offerings which
> wouldn't work for your use case which limits you to use the direct runner
> (not meant for production/high performance applications). Beam Python SDK
> uses cython for performance reasons but I don't believe it strictly
> requires it as many unit tests run with and without cython enabled.
> Integrations between Beam and third party libraries may require it though
> so it likely depends on what you plan to do.
>
> On Wed, Jun 10, 2020 at 8:17 AM Noah Goodrich  wrote:
>
>> I am looking at using the Beam Python SDK in AWS Glue but it doesn't
>> support non-native python libraries (anything that is c/c++ based).
>>
>> Is the Beam Python SDK / runners able to be used without any c/c++
>> library dependencies?
>>
>


contributor permission for Beam Jira tickets

2020-06-10 Thread Stuart Perks
Hi, 

Can I be added to the JIRA contributor so I can assign a Jira to myself please? 

User Name: Perks

Thanks,

Stuart

Season of Docs: Interested in working with Apache Beam

2020-06-10 Thread Cynthia Iradukunda
Greetings,

I am hoping this finds you well. I am keenly interested in contributing to
Apache Beam during the Season of Docs.

I am interested in growing my technical writing skills and see the Season
of Docs as an excellent way to achieve that goal. As an entry-level
technical writer, I would appreciate getting some recommendations on how I
can show my ability to work effectively on this project.

I am looking forward to hearing from you.

Best Regards,
Cynthia, Iradukunda


Re: Question on NEXMark

2020-06-10 Thread Andrew Pilloud
I think the author of this test is long gone, but the code originated
inside google. This query is not part of the original Nexmark suite but was
designed to exercise corner cases caused by out of order events, so that is
what you are probably seeing. Here are relevant bits from the original
commit messages:

New query 11 to exercise session windows.

Q11 started as a basic session windows test
with out-of-order and delayed events.
This refines the trigger to limit the number
of events in sessions.

Andrew

On Wed, Jun 10, 2020 at 10:37 AM Sruthi S Kumar 
wrote:

> Hi,
>
> We are working on a Flink project and enhancing some state backend
> functionality. We are using NEXMark benchmark to compare different state
> backends performance of Flink. While running NEXMark queries using Flink
> runner of Beam we have noticed that there is quite a lot of non-existent
> read from the state-backed.
>
> For example, when running query 11 with RocksDB state-backed, we had
> around 368 successful reads while we had around 527 attempts to read
> non-existent reads. We are curious if that is intentional and if so what's
> the rationale behind it?
>
>
> --
> Regards,
>
> Sruthi
>


python precommit error - google-auth depenedency?

2020-06-10 Thread Udi Meiri
Hi,
I'm trying to understand these "pip check" failures:

ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but you'll
have rsa 4.1 which is incompatible


https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console

However, when I do
pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]

locally, the google-auth package is not installed at all.
Any ideas on how to debug where this requirement is coming from?


smime.p7s
Description: S/MIME Cryptographic Signature


Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Valentyn Tymofieiev
> Any ideas on how to debug where this requirement is coming from?
You could try installing and calling pipdeptree [1] from a Jenkins job, and
see if it helps.

[1] https://pypi.org/project/pipdeptree/
On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri  wrote:

> Hi,
> I'm trying to understand these "pip check" failures:
>
> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but you'll have 
> rsa 4.1 which is incompatible
>
>
> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>
> However, when I do
> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>
> locally, the google-auth package is not installed at all.
> Any ideas on how to debug where this requirement is coming from?
>


Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Udi Meiri
Thanks, that helped in an unexpected way. :)
I should have used the "gcp" extra instead of "cloud" in my pip install
command above.

On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev 
wrote:

> > Any ideas on how to debug where this requirement is coming from?
> You could try installing and calling pipdeptree [1] from a Jenkins job,
> and see if it helps.
>
> [1] https://pypi.org/project/pipdeptree/
> On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri  wrote:
>
>> Hi,
>> I'm trying to understand these "pip check" failures:
>>
>> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but you'll have 
>> rsa 4.1 which is incompatible
>>
>>
>> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>>
>> However, when I do
>> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>>
>> locally, the google-auth package is not installed at all.
>> Any ideas on how to debug where this requirement is coming from?
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Udi Meiri
Seems like manually installing rsa==4.0 satisfies deps, but pip doesn't do
transitive deps well.

Would it be right to put a direct dependency on rsa<4.1,>=3.1.4 in setup.py?

On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri  wrote:

> Thanks, that helped in an unexpected way. :)
> I should have used the "gcp" extra instead of "cloud" in my pip install
> command above.
>
> On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev 
> wrote:
>
>> > Any ideas on how to debug where this requirement is coming from?
>> You could try installing and calling pipdeptree [1] from a Jenkins job,
>> and see if it helps.
>>
>> [1] https://pypi.org/project/pipdeptree/
>> On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri  wrote:
>>
>>> Hi,
>>> I'm trying to understand these "pip check" failures:
>>>
>>> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but you'll have 
>>> rsa 4.1 which is incompatible
>>>
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>>>
>>> However, when I do
>>> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>>>
>>> locally, the google-auth package is not installed at all.
>>> Any ideas on how to debug where this requirement is coming from?
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Remove EOL'd Runners

2020-06-10 Thread Luke Cwik
The jobs won't be deleted but will be disabled. I can help delete the jobs
from Jenkins once the jenkins configurations are removed either ping me
directly or update this thread when that should be done.

On Wed, Jun 10, 2020 at 10:38 AM Kenneth Knowles  wrote:

> +1
>
> All Jenkins configs are in the repo. There's a lag between merge and run
> of the "seed job" that syncs our configs. We can do a manual run of it, or
> just not worry about the temporary redness in the jobs that will be deleted
> anyhow.
>
> On Wed, Jun 10, 2020 at 8:57 AM Jan Lukavský  wrote:
>
>> +1
>> On 6/10/20 5:51 PM, David Morávek wrote:
>>
>> +1
>>
>> On Tue, Jun 9, 2020 at 7:43 PM Ahmet Altay  wrote:
>>
>>> Thank you Tyson!
>>>
>>> On Tue, Jun 9, 2020 at 10:20 AM Thomas Weise  wrote:
>>>
 +1


 On Tue, Jun 9, 2020 at 9:41 AM Robert Bradshaw 
 wrote:

> Makes sense to me.
>
> On Tue, Jun 9, 2020 at 8:45 AM Maximilian Michels 
> wrote:
>
>> Thanks of the heads-up, Tyson! It's a sensible decision to remove
>> unsupported runners.
>>
>> -Max
>>
>> On 09.06.20 16:51, Tyson Hamilton wrote:
>> > Hi All,
>> >
>> > As part of the Fixit [1] I'd like to remove EOL'd runners, Apex and
>> Gearpump, as described in BEAM- [2]. This will be a big PR I think 
>> and
>> didn't want anyone to be surprised. There is already some agreement in 
>> the
>> linked Jira issue. If there are no objections I'll get started later 
>> today
>> or tomorrow.
>> >
>> > -Tyson
>> >
>> >
>> > [1]:
>> https://lists.apache.org/thread.html/r9ddc77a8fee58ad02f68e2d9a7f054aab3e55717cc88ad1d5bc49311%40%3Cdev.beam.apache.org%3E
>> > [2]: https://issues.apache.org/jira/browse/BEAM-
>> >
>>
>


Re: Remove EOL'd Runners

2020-06-10 Thread Tyson Hamilton
Sounds good, thanks.

I removed Gearpump first and will move on to Apex later today. When that PR
is merged we can clean up the Jenkins jobs in one swoop for both removed
runners.

On Wed, Jun 10, 2020, 11:18 AM Luke Cwik  wrote:

> The jobs won't be deleted but will be disabled. I can help delete the jobs
> from Jenkins once the jenkins configurations are removed either ping me
> directly or update this thread when that should be done.
>
> On Wed, Jun 10, 2020 at 10:38 AM Kenneth Knowles  wrote:
>
>> +1
>>
>> All Jenkins configs are in the repo. There's a lag between merge and run
>> of the "seed job" that syncs our configs. We can do a manual run of it, or
>> just not worry about the temporary redness in the jobs that will be deleted
>> anyhow.
>>
>> On Wed, Jun 10, 2020 at 8:57 AM Jan Lukavský  wrote:
>>
>>> +1
>>> On 6/10/20 5:51 PM, David Morávek wrote:
>>>
>>> +1
>>>
>>> On Tue, Jun 9, 2020 at 7:43 PM Ahmet Altay  wrote:
>>>
 Thank you Tyson!

 On Tue, Jun 9, 2020 at 10:20 AM Thomas Weise  wrote:

> +1
>
>
> On Tue, Jun 9, 2020 at 9:41 AM Robert Bradshaw 
> wrote:
>
>> Makes sense to me.
>>
>> On Tue, Jun 9, 2020 at 8:45 AM Maximilian Michels 
>> wrote:
>>
>>> Thanks of the heads-up, Tyson! It's a sensible decision to remove
>>> unsupported runners.
>>>
>>> -Max
>>>
>>> On 09.06.20 16:51, Tyson Hamilton wrote:
>>> > Hi All,
>>> >
>>> > As part of the Fixit [1] I'd like to remove EOL'd runners, Apex
>>> and Gearpump, as described in BEAM- [2]. This will be a big PR I 
>>> think
>>> and didn't want anyone to be surprised. There is already some agreement 
>>> in
>>> the linked Jira issue. If there are no objections I'll get started later
>>> today or tomorrow.
>>> >
>>> > -Tyson
>>> >
>>> >
>>> > [1]:
>>> https://lists.apache.org/thread.html/r9ddc77a8fee58ad02f68e2d9a7f054aab3e55717cc88ad1d5bc49311%40%3Cdev.beam.apache.org%3E
>>> > [2]: https://issues.apache.org/jira/browse/BEAM-
>>> >
>>>
>>


Re: Ensuring messages are processed and emitted in-order

2020-06-10 Thread Luke Cwik
For runners that support @RequiresTimeSortedInput, all your input will come
time sorted (as long as your element's timestamp tracks the order that you
want).
For runners that don't support this, you need to build a StatefulDoFn that
buffers out of order events and reorders them to the order that you need.

@Pablo Estrada  Any other suggestions for supporting
CDC type pipelines?

On Tue, Jun 9, 2020 at 6:59 PM Catlyn Kong  wrote:

> Thanks a lot for the response!
>
> We have several business use cases that rely strongly on ordering by Kafka
> offset:
> 1) streaming unwindowed inner join: say we want to join users with reviews
> on user_id. Here are the schemas for two streams:
> user:
>
>- user_id
>- name
>- timestamp
>
> reviews:
>
>- review_id
>- user_id
>- timestamp
>
> Here are the messages in each stream ordered by kafka offset:
> user:
> (1, name_a, 60), (2, name_b, 120), (1, name_c, 240)
> reviews:
> (ABC, 1, 90), (DEF, 2, 360)
> I would expect to receive following output messages:
> (1, name_a, ABC) at timestamp 90
> (1, name_c, ABC) at timestamp 240
> (2, name_b, DEF) at timestamp 360
> This can be done in native Flink since Flink kafka consumer reads from
> each partition sequentially. But without an ordering guarantee, we can end
> up with arbitrary results. So how would we implement this in Beam?
> 2) unwindowed aggregation: aggregate all the employees for every
> organization. Say we have a new employee stream with the following schema:
> new_employee:
>
>- organization_id
>- employee_name
>
> And here are messaged ordered by kafka offset:
> (1, name_a), (2, name_b), (2, name_c), (1, name_d)
> I would expect the output to be:
> (1, [name_a]), (2, [name_b]), (2, [name_b, name_c]), (1, [name_a, name_d])
> Again without an ordering guarantee, the result is non deterministic.
>
> Change data capture (CDC) streams are a very common use case for our data
> pipeline. As in the examples above we rely on Kafka offsets to make sure we
> process data mutations in the proper order. While in some cases we have
> Flink native solutions to these problems (Flink provides ordering
> guarantees within the chosen key), we are now building some new Beam
> applications that would require ordering guarantees. What is the
> recommended approach in Beam for such use cases? If this isn’t currently
> supported, do we have any near plan to add native ordering support in Beam?
>
>
> On 2020/06/09 20:37:22, Luke Cwik  wrote:
> > This will likely break due to:>
> > * workers can have more then one thread and hence process the source in>
> > parallel>
> > * splitting a source allows for the source to be broken up into
> multiple>
> > restrictions and hence the runner can process those restrictions in any>
> > order they want. (lets say your kafka partition has unconsumed commit>
> > offset range [20, 100), this could be split into [20, 60), [60, 100)
> and>
> > the [60, 100) offset range could be processed first)>
> >
> > You're right that you need to sort the output however you want within
> your>
> > DoFn before you make external calls to Kafka (this prevents you from
> using>
> > the KafkaIO sink implementation as a transform). There is an annotation>
> > @RequiresTimeSortedInput which is a special case for this sorting if
> you>
> > want it to be sorted by the elements timestamp but still you'll need to>
> > write to Kafka directly yourself from your DoFn.>
> >
> > On Mon, Jun 8, 2020 at 4:24 PM Hadi Zhang  wrote:>
> >
> > > We are using the Beam 2.20 Python SDK on a Flink 1.9 runner. Our>
> > > messages originate from a custom source that consumes messages from a>
> > > Kafka topic and emits them in the order of their Kafka offsets to a>
> > > DoFn. After this DoFn processes the messages, they are emitted to a>
> > > custom sink that sends messages to a Kafka topic.>
> > >>
> > > We want to process those messages in the order in which we receive>
> > > them from Kafka and then emit them to the Kafka sink in the same>
> > > order, but based on our understanding Beam does not provide an>
> > > in-order transport. However, in practice we noticed that with a
> Python>
> > > SDK worker on Flink and a parallelism setting of 1 and one sdk_worker>
> > > instance, messages seem to be both processed and emitted in order. Is>
> > > that implementation-specific in-order behavior something that we can>
> > > rely on, or is it very likely that this will break at some future>
> > > point?>
> > >>
> > > In case it's not recommended to depend on that behavior what is the>
> > > best approach for in-order processing?>
> > >>
> > >
> https://stackoverflow.com/questions/45888719/processing-total-ordering-of-events-by-key-using-apache-beam>
>
> > > recommends to order events in a heap, but according to our>
> > > understanding this approach will only work when directly writing to
> an>
> > > external system.>
> > >>
> >
>


Re: Ensuring messages are processed and emitted in-order

2020-06-10 Thread Reuven Lax
I don't know how well RequiresTimeSortedInput will work for any late data.

I think you will want to include the Kafka offset in your records (unless
the records have their own sequence number) and then use state to buffer
and sort. There is a proposal (and work in progress) for a sorted state
API, which will make this easier and more efficient.

Reuven

On Wed, Jun 10, 2020 at 1:25 PM Luke Cwik  wrote:

> For runners that support @RequiresTimeSortedInput, all your input will
> come time sorted (as long as your element's timestamp tracks the order that
> you want).
> For runners that don't support this, you need to build a StatefulDoFn that
> buffers out of order events and reorders them to the order that you need.
>
> @Pablo Estrada  Any other suggestions for supporting
> CDC type pipelines?
>
> On Tue, Jun 9, 2020 at 6:59 PM Catlyn Kong  wrote:
>
>> Thanks a lot for the response!
>>
>> We have several business use cases that rely strongly on ordering by
>> Kafka offset:
>> 1) streaming unwindowed inner join: say we want to join users with
>> reviews on user_id. Here are the schemas for two streams:
>> user:
>>
>>- user_id
>>- name
>>- timestamp
>>
>> reviews:
>>
>>- review_id
>>- user_id
>>- timestamp
>>
>> Here are the messages in each stream ordered by kafka offset:
>> user:
>> (1, name_a, 60), (2, name_b, 120), (1, name_c, 240)
>> reviews:
>> (ABC, 1, 90), (DEF, 2, 360)
>> I would expect to receive following output messages:
>> (1, name_a, ABC) at timestamp 90
>> (1, name_c, ABC) at timestamp 240
>> (2, name_b, DEF) at timestamp 360
>> This can be done in native Flink since Flink kafka consumer reads from
>> each partition sequentially. But without an ordering guarantee, we can end
>> up with arbitrary results. So how would we implement this in Beam?
>> 2) unwindowed aggregation: aggregate all the employees for every
>> organization. Say we have a new employee stream with the following schema:
>> new_employee:
>>
>>- organization_id
>>- employee_name
>>
>> And here are messaged ordered by kafka offset:
>> (1, name_a), (2, name_b), (2, name_c), (1, name_d)
>> I would expect the output to be:
>> (1, [name_a]), (2, [name_b]), (2, [name_b, name_c]), (1, [name_a, name_d])
>> Again without an ordering guarantee, the result is non deterministic.
>>
>> Change data capture (CDC) streams are a very common use case for our data
>> pipeline. As in the examples above we rely on Kafka offsets to make sure we
>> process data mutations in the proper order. While in some cases we have
>> Flink native solutions to these problems (Flink provides ordering
>> guarantees within the chosen key), we are now building some new Beam
>> applications that would require ordering guarantees. What is the
>> recommended approach in Beam for such use cases? If this isn’t currently
>> supported, do we have any near plan to add native ordering support in Beam?
>>
>>
>> On 2020/06/09 20:37:22, Luke Cwik  wrote:
>> > This will likely break due to:>
>> > * workers can have more then one thread and hence process the source
>> in>
>> > parallel>
>> > * splitting a source allows for the source to be broken up into
>> multiple>
>> > restrictions and hence the runner can process those restrictions in
>> any>
>> > order they want. (lets say your kafka partition has unconsumed commit>
>> > offset range [20, 100), this could be split into [20, 60), [60, 100)
>> and>
>> > the [60, 100) offset range could be processed first)>
>> >
>> > You're right that you need to sort the output however you want within
>> your>
>> > DoFn before you make external calls to Kafka (this prevents you from
>> using>
>> > the KafkaIO sink implementation as a transform). There is an
>> annotation>
>> > @RequiresTimeSortedInput which is a special case for this sorting if
>> you>
>> > want it to be sorted by the elements timestamp but still you'll need
>> to>
>> > write to Kafka directly yourself from your DoFn.>
>> >
>> > On Mon, Jun 8, 2020 at 4:24 PM Hadi Zhang  wrote:>
>> >
>> > > We are using the Beam 2.20 Python SDK on a Flink 1.9 runner. Our>
>> > > messages originate from a custom source that consumes messages from
>> a>
>> > > Kafka topic and emits them in the order of their Kafka offsets to a>
>> > > DoFn. After this DoFn processes the messages, they are emitted to a>
>> > > custom sink that sends messages to a Kafka topic.>
>> > >>
>> > > We want to process those messages in the order in which we receive>
>> > > them from Kafka and then emit them to the Kafka sink in the same>
>> > > order, but based on our understanding Beam does not provide an>
>> > > in-order transport. However, in practice we noticed that with a
>> Python>
>> > > SDK worker on Flink and a parallelism setting of 1 and one
>> sdk_worker>
>> > > instance, messages seem to be both processed and emitted in order.
>> Is>
>> > > that implementation-specific in-order behavior something that we can>
>> > > rely on, or is it very 

Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Kenneth Knowles
You may be interested in following https://github.com/pypa/pip/issues/988 if
you are not already.

Kenn

On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri  wrote:

> Seems like manually installing rsa==4.0 satisfies deps, but pip doesn't do
> transitive deps well.
>
> Would it be right to put a direct dependency on rsa<4.1,>=3.1.4 in
> setup.py?
>
> On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri  wrote:
>
>> Thanks, that helped in an unexpected way. :)
>> I should have used the "gcp" extra instead of "cloud" in my pip install
>> command above.
>>
>> On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev 
>> wrote:
>>
>>> > Any ideas on how to debug where this requirement is coming from?
>>> You could try installing and calling pipdeptree [1] from a Jenkins job,
>>> and see if it helps.
>>>
>>> [1] https://pypi.org/project/pipdeptree/
>>> On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri  wrote:
>>>
 Hi,
 I'm trying to understand these "pip check" failures:

 ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but you'll have 
 rsa 4.1 which is incompatible


 https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console

 However, when I do
 pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]

 locally, the google-auth package is not installed at all.
 Any ideas on how to debug where this requirement is coming from?

>>>


Re: Question on NEXMark

2020-06-10 Thread Kenneth Knowles
It sounds like it could be something worth addressing. I don't really know
the cost of this behavior. The pipeline is pretty easy to read. The
pipeline itself does not explicitly manage any state, so it would be in the
Flink execution of the GroupByKey primitive transform. The relevant code is
probably in ReduceFnRunner/WatermarkHold, which is actually shared across
many runners.

Kenn

On Wed, Jun 10, 2020 at 11:25 AM Andrew Pilloud  wrote:

> I think the author of this test is long gone, but the code originated
> inside google. This query is not part of the original Nexmark suite but was
> designed to exercise corner cases caused by out of order events, so that is
> what you are probably seeing. Here are relevant bits from the original
> commit messages:
>
> New query 11 to exercise session windows.
>
> Q11 started as a basic session windows test
> with out-of-order and delayed events.
> This refines the trigger to limit the number
> of events in sessions.
>
> Andrew
>
> On Wed, Jun 10, 2020 at 10:37 AM Sruthi S Kumar 
> wrote:
>
>> Hi,
>>
>> We are working on a Flink project and enhancing some state backend
>> functionality. We are using NEXMark benchmark to compare different state
>> backends performance of Flink. While running NEXMark queries using Flink
>> runner of Beam we have noticed that there is quite a lot of non-existent
>> read from the state-backed.
>>
>> For example, when running query 11 with RocksDB state-backed, we had
>> around 368 successful reads while we had around 527 attempts to read
>> non-existent reads. We are curious if that is intentional and if so what's
>> the rationale behind it?
>>
>>
>> --
>> Regards,
>>
>> Sruthi
>>
>


Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Ahmet Altay
On Wed, Jun 10, 2020 at 1:29 PM Kenneth Knowles  wrote:

> You may be interested in following https://github.com/pypa/pip/issues/988 if
> you are not already.
>
> Kenn
>
> On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri  wrote:
>
>> Seems like manually installing rsa==4.0 satisfies deps, but pip doesn't
>> do transitive deps well.
>>
>> Would it be right to put a direct dependency on rsa<4.1,>=3.1.4 in
>> setup.py?
>>
>
Did you find where the google-auth dependency is coming from? We might try
to fix the problem at the source of that dependency instead of adding rsa
to beam's setup.py.


>
>> On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri  wrote:
>>
>>> Thanks, that helped in an unexpected way. :)
>>> I should have used the "gcp" extra instead of "cloud" in my pip install
>>> command above.
>>>
>>> On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
 > Any ideas on how to debug where this requirement is coming from?
 You could try installing and calling pipdeptree [1] from a Jenkins job,
 and see if it helps.

 [1] https://pypi.org/project/pipdeptree/
 On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri  wrote:

> Hi,
> I'm trying to understand these "pip check" failures:
>
> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but you'll 
> have rsa 4.1 which is incompatible
>
>
> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>
> However, when I do
> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>
> locally, the google-auth package is not installed at all.
> Any ideas on how to debug where this requirement is coming from?
>



Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Udi Meiri
On Wed, Jun 10, 2020 at 1:59 PM Ahmet Altay  wrote:

>
>
> On Wed, Jun 10, 2020 at 1:29 PM Kenneth Knowles  wrote:
>
>> You may be interested in following https://github.com/pypa/pip/issues/988 if
>> you are not already.
>>
>> Kenn
>>
>> On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri  wrote:
>>
>>> Seems like manually installing rsa==4.0 satisfies deps, but pip doesn't
>>> do transitive deps well.
>>>
>>> Would it be right to put a direct dependency on rsa<4.1,>=3.1.4 in
>>> setup.py?
>>>
>>
> Did you find where the google-auth dependency is coming from? We might try
> to fix the problem at the source of that dependency instead of adding rsa
> to beam's setup.py.
>

oauth2client depends on rsa>=3.14 with no upper limit. rsa 4.1 was released
today.
The places that require rsa<4.1 are deeper in the dependency tree. For
example:

google-cloud-bigquery==1.24.0
  - google-api-core [required: >=1.15.0,<2.0dev, installed: 1.20.0]
- google-auth [required: >=1.14.0,<2.0dev, installed: 1.16.1]
  - rsa [required: >=3.1.4,<4.1, installed: 4.1]


>
>>
>>> On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri  wrote:
>>>
 Thanks, that helped in an unexpected way. :)
 I should have used the "gcp" extra instead of "cloud" in my pip install
 command above.

 On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> > Any ideas on how to debug where this requirement is coming from?
> You could try installing and calling pipdeptree [1] from a Jenkins
> job, and see if it helps.
>
> [1] https://pypi.org/project/pipdeptree/
> On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri  wrote:
>
>> Hi,
>> I'm trying to understand these "pip check" failures:
>>
>> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but you'll 
>> have rsa 4.1 which is incompatible
>>
>>
>> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>>
>> However, when I do
>> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>>
>> locally, the google-auth package is not installed at all.
>> Any ideas on how to debug where this requirement is coming from?
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Ahmet Altay
Looks like there is an attempt to fix this:
https://github.com/googleapis/google-auth-library-python/pull/524

On Wed, Jun 10, 2020 at 2:07 PM Udi Meiri  wrote:

>
>
> On Wed, Jun 10, 2020 at 1:59 PM Ahmet Altay  wrote:
>
>>
>>
>> On Wed, Jun 10, 2020 at 1:29 PM Kenneth Knowles  wrote:
>>
>>> You may be interested in following
>>> https://github.com/pypa/pip/issues/988 if you are not already.
>>>
>>> Kenn
>>>
>>> On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri  wrote:
>>>
 Seems like manually installing rsa==4.0 satisfies deps, but pip doesn't
 do transitive deps well.

 Would it be right to put a direct dependency on rsa<4.1,>=3.1.4 in
 setup.py?

>>>
>> Did you find where the google-auth dependency is coming from? We might
>> try to fix the problem at the source of that dependency instead of adding
>> rsa to beam's setup.py.
>>
>
> oauth2client depends on rsa>=3.14 with no upper limit. rsa 4.1 was
> released today.
> The places that require rsa<4.1 are deeper in the dependency tree. For
> example:
>
> google-cloud-bigquery==1.24.0
>   - google-api-core [required: >=1.15.0,<2.0dev, installed: 1.20.0]
> - google-auth [required: >=1.14.0,<2.0dev, installed: 1.16.1]
>   - rsa [required: >=3.1.4,<4.1, installed: 4.1]
>
>
>>
>>>
 On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri  wrote:

> Thanks, that helped in an unexpected way. :)
> I should have used the "gcp" extra instead of "cloud" in my pip
> install command above.
>
> On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> > Any ideas on how to debug where this requirement is coming from?
>> You could try installing and calling pipdeptree [1] from a Jenkins
>> job, and see if it helps.
>>
>> [1] https://pypi.org/project/pipdeptree/
>> On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri  wrote:
>>
>>> Hi,
>>> I'm trying to understand these "pip check" failures:
>>>
>>> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but you'll 
>>> have rsa 4.1 which is incompatible
>>>
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>>>
>>> However, when I do
>>> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>>>
>>> locally, the google-auth package is not installed at all.
>>> Any ideas on how to debug where this requirement is coming from?
>>>
>>


Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Udi Meiri
4.1 drops Python 2 support, so I'm not sure if we're ready for that yet.

On Wed, Jun 10, 2020 at 2:20 PM Ahmet Altay  wrote:

> Looks like there is an attempt to fix this:
> https://github.com/googleapis/google-auth-library-python/pull/524
>
> On Wed, Jun 10, 2020 at 2:07 PM Udi Meiri  wrote:
>
>>
>>
>> On Wed, Jun 10, 2020 at 1:59 PM Ahmet Altay  wrote:
>>
>>>
>>>
>>> On Wed, Jun 10, 2020 at 1:29 PM Kenneth Knowles  wrote:
>>>
 You may be interested in following
 https://github.com/pypa/pip/issues/988 if you are not already.

 Kenn

 On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri  wrote:

> Seems like manually installing rsa==4.0 satisfies deps, but pip
> doesn't do transitive deps well.
>
> Would it be right to put a direct dependency on rsa<4.1,>=3.1.4 in
> setup.py?
>

>>> Did you find where the google-auth dependency is coming from? We might
>>> try to fix the problem at the source of that dependency instead of adding
>>> rsa to beam's setup.py.
>>>
>>
>> oauth2client depends on rsa>=3.14 with no upper limit. rsa 4.1 was
>> released today.
>> The places that require rsa<4.1 are deeper in the dependency tree. For
>> example:
>>
>> google-cloud-bigquery==1.24.0
>>   - google-api-core [required: >=1.15.0,<2.0dev, installed: 1.20.0]
>> - google-auth [required: >=1.14.0,<2.0dev, installed: 1.16.1]
>>   - rsa [required: >=3.1.4,<4.1, installed: 4.1]
>>
>>
>>>

> On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri  wrote:
>
>> Thanks, that helped in an unexpected way. :)
>> I should have used the "gcp" extra instead of "cloud" in my pip
>> install command above.
>>
>> On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>>
>>> > Any ideas on how to debug where this requirement is coming from?
>>> You could try installing and calling pipdeptree [1] from a Jenkins
>>> job, and see if it helps.
>>>
>>> [1] https://pypi.org/project/pipdeptree/
>>> On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri  wrote:
>>>
 Hi,
 I'm trying to understand these "pip check" failures:

 ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but you'll 
 have rsa 4.1 which is incompatible



 https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console

 However, when I do
 pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]

 locally, the google-auth package is not installed at all.
 Any ideas on how to debug where this requirement is coming from?

>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Beam Summit Status Report - 6/10

2020-06-10 Thread Brittany Hermann
Hi folks,

I wanted to provide you with the Beam Summit Status report from today's
meeting. If you would like to join the next public meeting on Wednesday,
June 24th at 11:30 AM PST please let me know and I will send a calendar
invite over to you!

Also don't forget to submit your CFP
 by June 15th and register
for the Summit !


https://docs.google.com/document/d/11PXOBUbeldgPqz6OlTswCal6SxyX76Bb_ZVKBdwsd7o/edit?usp=sharing

Have a great day!

-- 

Brittany Hermann

Open Source Program Manager (Provided by Adecco Staffing)

1190 Bordeaux Drive , Building 4, Sunnyvale, CA 94089



Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Bu Sun Kim
Hi,

google-auth has been released (with the wider pin

on
rsa).

On Wed, Jun 10, 2020 at 6:07 PM Ahmet Altay  wrote:

>
>
> On Wed, Jun 10, 2020 at 4:07 PM Kyle Weaver  wrote:
>
>> The fix to google-auth has been merged. Is the plan just to wait until a
>> new version of google-auth is released and ignore the failing tests until
>> then? (btw I filed a JIRA for this before I realized it was already being
>> discussed here: https://issues.apache.org/jira/browse/BEAM-10232)
>>
>
> Could we add it as a test dependency? Or if that is not possible, add it
> but remove it before next release?
>
> It seems like there is a release PR on google-auth (
> https://github.com/googleapis/google-auth-library-python/pull/525). I
> asked +Bu Sun Kim  on the PR, they usually release
> pretty quickly.
>
>
>>
>> On Wed, Jun 10, 2020 at 3:21 PM Udi Meiri  wrote:
>>
>>> Yes you're right, Py2 envs are still using 4.0.
>>>
>>> On Wed, Jun 10, 2020 at 3:03 PM Ahmet Altay  wrote:
>>>


 On Wed, Jun 10, 2020 at 2:25 PM Udi Meiri  wrote:

> 4.1 drops Python 2 support, so I'm not sure if we're ready for that
> yet.
>

 Wouldn't that work by default? In python 2 oauth2client's rsa>3.14
 requirement will resolve to latest python2 supporting version of rsa (4.0?)


>
> On Wed, Jun 10, 2020 at 2:20 PM Ahmet Altay  wrote:
>
>> Looks like there is an attempt to fix this:
>> https://github.com/googleapis/google-auth-library-python/pull/524
>>
>> On Wed, Jun 10, 2020 at 2:07 PM Udi Meiri  wrote:
>>
>>>
>>>
>>> On Wed, Jun 10, 2020 at 1:59 PM Ahmet Altay 
>>> wrote:
>>>


 On Wed, Jun 10, 2020 at 1:29 PM Kenneth Knowles 
 wrote:

> You may be interested in following
> https://github.com/pypa/pip/issues/988 if you are not already.
>
> Kenn
>
> On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri 
> wrote:
>
>> Seems like manually installing rsa==4.0 satisfies deps, but pip
>> doesn't do transitive deps well.
>>
>> Would it be right to put a direct dependency on rsa<4.1,>=3.1.4
>> in setup.py?
>>
>
 Did you find where the google-auth dependency is coming from? We
 might try to fix the problem at the source of that dependency instead 
 of
 adding rsa to beam's setup.py.

>>>
>>> oauth2client depends on rsa>=3.14 with no upper limit. rsa 4.1 was
>>> released today.
>>> The places that require rsa<4.1 are deeper in the dependency tree.
>>> For example:
>>>
>>> google-cloud-bigquery==1.24.0
>>>   - google-api-core [required: >=1.15.0,<2.0dev, installed: 1.20.0]
>>> - google-auth [required: >=1.14.0,<2.0dev, installed: 1.16.1]
>>>   - rsa [required: >=3.1.4,<4.1, installed: 4.1]
>>>
>>>

>
>> On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri 
>> wrote:
>>
>>> Thanks, that helped in an unexpected way. :)
>>> I should have used the "gcp" extra instead of "cloud" in my pip
>>> install command above.
>>>
>>> On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
 > Any ideas on how to debug where this requirement is coming
 from?
 You could try installing and calling pipdeptree [1] from a
 Jenkins job, and see if it helps.

 [1] https://pypi.org/project/pipdeptree/
 On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri 
 wrote:

> Hi,
> I'm trying to understand these "pip check" failures:
>
> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but 
> you'll have rsa 4.1 which is incompatible
>
>
>
> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>
> However, when I do
> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>
> locally, the google-auth package is not installed at all.
> Any ideas on how to debug where this requirement is coming
> from?
>



Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Ahmet Altay
On Wed, Jun 10, 2020 at 4:07 PM Kyle Weaver  wrote:

> The fix to google-auth has been merged. Is the plan just to wait until a
> new version of google-auth is released and ignore the failing tests until
> then? (btw I filed a JIRA for this before I realized it was already being
> discussed here: https://issues.apache.org/jira/browse/BEAM-10232)
>

Could we add it as a test dependency? Or if that is not possible, add it
but remove it before next release?

It seems like there is a release PR on google-auth (
https://github.com/googleapis/google-auth-library-python/pull/525). I asked +Bu
Sun Kim  on the PR, they usually release pretty
quickly.


>
> On Wed, Jun 10, 2020 at 3:21 PM Udi Meiri  wrote:
>
>> Yes you're right, Py2 envs are still using 4.0.
>>
>> On Wed, Jun 10, 2020 at 3:03 PM Ahmet Altay  wrote:
>>
>>>
>>>
>>> On Wed, Jun 10, 2020 at 2:25 PM Udi Meiri  wrote:
>>>
 4.1 drops Python 2 support, so I'm not sure if we're ready for that yet.

>>>
>>> Wouldn't that work by default? In python 2 oauth2client's rsa>3.14
>>> requirement will resolve to latest python2 supporting version of rsa (4.0?)
>>>
>>>

 On Wed, Jun 10, 2020 at 2:20 PM Ahmet Altay  wrote:

> Looks like there is an attempt to fix this:
> https://github.com/googleapis/google-auth-library-python/pull/524
>
> On Wed, Jun 10, 2020 at 2:07 PM Udi Meiri  wrote:
>
>>
>>
>> On Wed, Jun 10, 2020 at 1:59 PM Ahmet Altay  wrote:
>>
>>>
>>>
>>> On Wed, Jun 10, 2020 at 1:29 PM Kenneth Knowles 
>>> wrote:
>>>
 You may be interested in following
 https://github.com/pypa/pip/issues/988 if you are not already.

 Kenn

 On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri 
 wrote:

> Seems like manually installing rsa==4.0 satisfies deps, but pip
> doesn't do transitive deps well.
>
> Would it be right to put a direct dependency on rsa<4.1,>=3.1.4 in
> setup.py?
>

>>> Did you find where the google-auth dependency is coming from? We
>>> might try to fix the problem at the source of that dependency instead of
>>> adding rsa to beam's setup.py.
>>>
>>
>> oauth2client depends on rsa>=3.14 with no upper limit. rsa 4.1 was
>> released today.
>> The places that require rsa<4.1 are deeper in the dependency tree.
>> For example:
>>
>> google-cloud-bigquery==1.24.0
>>   - google-api-core [required: >=1.15.0,<2.0dev, installed: 1.20.0]
>> - google-auth [required: >=1.14.0,<2.0dev, installed: 1.16.1]
>>   - rsa [required: >=3.1.4,<4.1, installed: 4.1]
>>
>>
>>>

> On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri 
> wrote:
>
>> Thanks, that helped in an unexpected way. :)
>> I should have used the "gcp" extra instead of "cloud" in my pip
>> install command above.
>>
>> On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>>
>>> > Any ideas on how to debug where this requirement is coming
>>> from?
>>> You could try installing and calling pipdeptree [1] from a
>>> Jenkins job, and see if it helps.
>>>
>>> [1] https://pypi.org/project/pipdeptree/
>>> On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri 
>>> wrote:
>>>
 Hi,
 I'm trying to understand these "pip check" failures:

 ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but 
 you'll have rsa 4.1 which is incompatible



 https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console

 However, when I do
 pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]

 locally, the google-auth package is not installed at all.
 Any ideas on how to debug where this requirement is coming from?

>>>


Re: contributor permission for Beam Jira tickets

2020-06-10 Thread Ahmet Altay
Done. Welcome!

On Wed, Jun 10, 2020 at 12:37 PM Stuart Perks 
wrote:

> Hi,
>
> Can I be added to the JIRA contributor so I can assign a Jira to myself
> please?
>
> User Name: Perks
>
> Thanks,
>
> Stuart
>


Re: [External] Re: Ensuring messages are processed and emitted in-order

2020-06-10 Thread Catlyn Kong
Thank y’all for the input!


About the RequiresTimeSortedInput, we were thinking of the following 2
potential approaches:

   1.

   Assign kafka offset as the timestamp while doing a GroupByKey on
   partition_id in a GlobalWindow
   2.

   Rely on the fact that Flink consumes from kafka partitions in offset
   order and assign ingestion time as the timestamp. (We're using our own
   non-KafkaIO based Kafka consumer extended from FlinkKafkaConsumer011 and
   thus have direct control over timestamp and watermark assignment)

We find it non-trivial to reason about watermark assignment especially when
taking into consideration that:

   1.

   there might be restarts at any given time and
   2.

   advancing watermark in one kafka partition might result in:
   1.

  dropping elements from other kafka partitions (if we’re not following
  native flink approach where we take the lowest watermark when merging
  streams) or
  2.

  delay output from other kafka partitions since they’ll be buffered.

Is there any recommendation on how this should be handled?

In the direction of using a StatefulDoFn to buffer and reorder, we’re
concerned about performance since we need to serialize and deserialize the
entire BagState (with all the messages) everytime we process a message. And
potentially insert this StatefulDoFn in multiple places in the pipeline. Is
there any benchmark result of a pipeline that does something similar for us
to reference?

The proposal for a sorted state API sounds promising, is there a ticket/doc
that we can follow?


On Wed, Jun 10, 2020 at 1:28 PM Reuven Lax  wrote:

> I don't know how well RequiresTimeSortedInput will work for any late data.
>
> I think you will want to include the Kafka offset in your records (unless
> the records have their own sequence number) and then use state to buffer
> and sort. There is a proposal (and work in progress) for a sorted state
> API, which will make this easier and more efficient.
>
> Reuven
>
> On Wed, Jun 10, 2020 at 1:25 PM Luke Cwik  wrote:
>
>> For runners that support @RequiresTimeSortedInput, all your input will
>> come time sorted (as long as your element's timestamp tracks the order that
>> you want).
>> For runners that don't support this, you need to build a StatefulDoFn
>> that buffers out of order events and reorders them to the order that you
>> need.
>>
>> @Pablo Estrada  Any other suggestions for supporting
>> CDC type pipelines?
>>
>> On Tue, Jun 9, 2020 at 6:59 PM Catlyn Kong  wrote:
>>
>>> Thanks a lot for the response!
>>>
>>> We have several business use cases that rely strongly on ordering by
>>> Kafka offset:
>>> 1) streaming unwindowed inner join: say we want to join users with
>>> reviews on user_id. Here are the schemas for two streams:
>>> user:
>>>
>>>- user_id
>>>- name
>>>- timestamp
>>>
>>> reviews:
>>>
>>>- review_id
>>>- user_id
>>>- timestamp
>>>
>>> Here are the messages in each stream ordered by kafka offset:
>>> user:
>>> (1, name_a, 60), (2, name_b, 120), (1, name_c, 240)
>>> reviews:
>>> (ABC, 1, 90), (DEF, 2, 360)
>>> I would expect to receive following output messages:
>>> (1, name_a, ABC) at timestamp 90
>>> (1, name_c, ABC) at timestamp 240
>>> (2, name_b, DEF) at timestamp 360
>>> This can be done in native Flink since Flink kafka consumer reads from
>>> each partition sequentially. But without an ordering guarantee, we can end
>>> up with arbitrary results. So how would we implement this in Beam?
>>> 2) unwindowed aggregation: aggregate all the employees for every
>>> organization. Say we have a new employee stream with the following schema:
>>> new_employee:
>>>
>>>- organization_id
>>>- employee_name
>>>
>>> And here are messaged ordered by kafka offset:
>>> (1, name_a), (2, name_b), (2, name_c), (1, name_d)
>>> I would expect the output to be:
>>> (1, [name_a]), (2, [name_b]), (2, [name_b, name_c]), (1, [name_a,
>>> name_d])
>>> Again without an ordering guarantee, the result is non deterministic.
>>>
>>> Change data capture (CDC) streams are a very common use case for our
>>> data pipeline. As in the examples above we rely on Kafka offsets to make
>>> sure we process data mutations in the proper order. While in some cases we
>>> have Flink native solutions to these problems (Flink provides ordering
>>> guarantees within the chosen key), we are now building some new Beam
>>> applications that would require ordering guarantees. What is the
>>> recommended approach in Beam for such use cases? If this isn’t currently
>>> supported, do we have any near plan to add native ordering support in Beam?
>>>
>>>
>>> On 2020/06/09 20:37:22, Luke Cwik  wrote:
>>> > This will likely break due to:>
>>> > * workers can have more then one thread and hence process the source
>>> in>
>>> > parallel>
>>> > * splitting a source allows for the source to be broken up into
>>> multiple>
>>> > restrictions and hence the runner can process 

Re: Beam Summit Status Report - 6/10

2020-06-10 Thread Ahmet Altay
Thank you Brittany and all others working on this. Progress looks good. :)

On Wed, Jun 10, 2020 at 4:56 PM Brittany Hermann 
wrote:

> Hi folks,
>
> I wanted to provide you with the Beam Summit Status report from today's
> meeting. If you would like to join the next public meeting on Wednesday,
> June 24th at 11:30 AM PST please let me know and I will send a calendar
> invite over to you!
>
> Also don't forget to submit your CFP
>  by June 15th and register
> for the Summit !
>
> 
>
> https://docs.google.com/document/d/11PXOBUbeldgPqz6OlTswCal6SxyX76Bb_ZVKBdwsd7o/edit?usp=sharing
>
> Have a great day!
>
> --
>
> Brittany Hermann
>
> Open Source Program Manager (Provided by Adecco Staffing)
>
> 1190 Bordeaux Drive , Building 4, Sunnyvale, CA 94089
> 
>
>
>


Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Ahmet Altay
On Wed, Jun 10, 2020 at 7:11 PM Bu Sun Kim  wrote:

> Hi,
>
> google-auth has been released (with the wider pin
> 
>  on
> rsa).
>

Thank you! Much appreciated!


>
> On Wed, Jun 10, 2020 at 6:07 PM Ahmet Altay  wrote:
>
>>
>>
>> On Wed, Jun 10, 2020 at 4:07 PM Kyle Weaver  wrote:
>>
>>> The fix to google-auth has been merged. Is the plan just to wait until a
>>> new version of google-auth is released and ignore the failing tests until
>>> then? (btw I filed a JIRA for this before I realized it was already being
>>> discussed here: https://issues.apache.org/jira/browse/BEAM-10232)
>>>
>>
>> Could we add it as a test dependency? Or if that is not possible, add it
>> but remove it before next release?
>>
>> It seems like there is a release PR on google-auth (
>> https://github.com/googleapis/google-auth-library-python/pull/525). I
>> asked +Bu Sun Kim  on the PR, they usually release
>> pretty quickly.
>>
>>
>>>
>>> On Wed, Jun 10, 2020 at 3:21 PM Udi Meiri  wrote:
>>>
 Yes you're right, Py2 envs are still using 4.0.

 On Wed, Jun 10, 2020 at 3:03 PM Ahmet Altay  wrote:

>
>
> On Wed, Jun 10, 2020 at 2:25 PM Udi Meiri  wrote:
>
>> 4.1 drops Python 2 support, so I'm not sure if we're ready for that
>> yet.
>>
>
> Wouldn't that work by default? In python 2 oauth2client's rsa>3.14
> requirement will resolve to latest python2 supporting version of rsa 
> (4.0?)
>
>
>>
>> On Wed, Jun 10, 2020 at 2:20 PM Ahmet Altay  wrote:
>>
>>> Looks like there is an attempt to fix this:
>>> https://github.com/googleapis/google-auth-library-python/pull/524
>>>
>>> On Wed, Jun 10, 2020 at 2:07 PM Udi Meiri  wrote:
>>>


 On Wed, Jun 10, 2020 at 1:59 PM Ahmet Altay 
 wrote:

>
>
> On Wed, Jun 10, 2020 at 1:29 PM Kenneth Knowles 
> wrote:
>
>> You may be interested in following
>> https://github.com/pypa/pip/issues/988 if you are not already.
>>
>> Kenn
>>
>> On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri 
>> wrote:
>>
>>> Seems like manually installing rsa==4.0 satisfies deps, but pip
>>> doesn't do transitive deps well.
>>>
>>> Would it be right to put a direct dependency on rsa<4.1,>=3.1.4
>>> in setup.py?
>>>
>>
> Did you find where the google-auth dependency is coming from? We
> might try to fix the problem at the source of that dependency instead 
> of
> adding rsa to beam's setup.py.
>

 oauth2client depends on rsa>=3.14 with no upper limit. rsa 4.1 was
 released today.
 The places that require rsa<4.1 are deeper in the dependency tree.
 For example:

 google-cloud-bigquery==1.24.0
   - google-api-core [required: >=1.15.0,<2.0dev, installed: 1.20.0]
 - google-auth [required: >=1.14.0,<2.0dev, installed: 1.16.1]
   - rsa [required: >=3.1.4,<4.1, installed: 4.1]


>
>>
>>> On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri 
>>> wrote:
>>>
 Thanks, that helped in an unexpected way. :)
 I should have used the "gcp" extra instead of "cloud" in my pip
 install command above.

 On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> > Any ideas on how to debug where this requirement is coming
> from?
> You could try installing and calling pipdeptree [1] from a
> Jenkins job, and see if it helps.
>
> [1] https://pypi.org/project/pipdeptree/
> On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri 
> wrote:
>
>> Hi,
>> I'm trying to understand these "pip check" failures:
>>
>> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but 
>> you'll have rsa 4.1 which is incompatible
>>
>>
>>
>> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>>
>> However, when I do
>> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>>
>> locally, the google-auth package is not installed at all.
>> Any ideas on how to debug where this requirement is coming
>> from?
>>
>


Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Ahmet Altay
On Wed, Jun 10, 2020 at 2:25 PM Udi Meiri  wrote:

> 4.1 drops Python 2 support, so I'm not sure if we're ready for that yet.
>

Wouldn't that work by default? In python 2 oauth2client's rsa>3.14
requirement will resolve to latest python2 supporting version of rsa (4.0?)


>
> On Wed, Jun 10, 2020 at 2:20 PM Ahmet Altay  wrote:
>
>> Looks like there is an attempt to fix this:
>> https://github.com/googleapis/google-auth-library-python/pull/524
>>
>> On Wed, Jun 10, 2020 at 2:07 PM Udi Meiri  wrote:
>>
>>>
>>>
>>> On Wed, Jun 10, 2020 at 1:59 PM Ahmet Altay  wrote:
>>>


 On Wed, Jun 10, 2020 at 1:29 PM Kenneth Knowles 
 wrote:

> You may be interested in following
> https://github.com/pypa/pip/issues/988 if you are not already.
>
> Kenn
>
> On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri  wrote:
>
>> Seems like manually installing rsa==4.0 satisfies deps, but pip
>> doesn't do transitive deps well.
>>
>> Would it be right to put a direct dependency on rsa<4.1,>=3.1.4 in
>> setup.py?
>>
>
 Did you find where the google-auth dependency is coming from? We might
 try to fix the problem at the source of that dependency instead of adding
 rsa to beam's setup.py.

>>>
>>> oauth2client depends on rsa>=3.14 with no upper limit. rsa 4.1 was
>>> released today.
>>> The places that require rsa<4.1 are deeper in the dependency tree. For
>>> example:
>>>
>>> google-cloud-bigquery==1.24.0
>>>   - google-api-core [required: >=1.15.0,<2.0dev, installed: 1.20.0]
>>> - google-auth [required: >=1.14.0,<2.0dev, installed: 1.16.1]
>>>   - rsa [required: >=3.1.4,<4.1, installed: 4.1]
>>>
>>>

>
>> On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri  wrote:
>>
>>> Thanks, that helped in an unexpected way. :)
>>> I should have used the "gcp" extra instead of "cloud" in my pip
>>> install command above.
>>>
>>> On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
 > Any ideas on how to debug where this requirement is coming from?
 You could try installing and calling pipdeptree [1] from a Jenkins
 job, and see if it helps.

 [1] https://pypi.org/project/pipdeptree/
 On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri 
 wrote:

> Hi,
> I'm trying to understand these "pip check" failures:
>
> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but you'll 
> have rsa 4.1 which is incompatible
>
>
>
> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>
> However, when I do
> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>
> locally, the google-auth package is not installed at all.
> Any ideas on how to debug where this requirement is coming from?
>



Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Udi Meiri
Yes you're right, Py2 envs are still using 4.0.

On Wed, Jun 10, 2020 at 3:03 PM Ahmet Altay  wrote:

>
>
> On Wed, Jun 10, 2020 at 2:25 PM Udi Meiri  wrote:
>
>> 4.1 drops Python 2 support, so I'm not sure if we're ready for that yet.
>>
>
> Wouldn't that work by default? In python 2 oauth2client's rsa>3.14
> requirement will resolve to latest python2 supporting version of rsa (4.0?)
>
>
>>
>> On Wed, Jun 10, 2020 at 2:20 PM Ahmet Altay  wrote:
>>
>>> Looks like there is an attempt to fix this:
>>> https://github.com/googleapis/google-auth-library-python/pull/524
>>>
>>> On Wed, Jun 10, 2020 at 2:07 PM Udi Meiri  wrote:
>>>


 On Wed, Jun 10, 2020 at 1:59 PM Ahmet Altay  wrote:

>
>
> On Wed, Jun 10, 2020 at 1:29 PM Kenneth Knowles 
> wrote:
>
>> You may be interested in following
>> https://github.com/pypa/pip/issues/988 if you are not already.
>>
>> Kenn
>>
>> On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri  wrote:
>>
>>> Seems like manually installing rsa==4.0 satisfies deps, but pip
>>> doesn't do transitive deps well.
>>>
>>> Would it be right to put a direct dependency on rsa<4.1,>=3.1.4 in
>>> setup.py?
>>>
>>
> Did you find where the google-auth dependency is coming from? We might
> try to fix the problem at the source of that dependency instead of adding
> rsa to beam's setup.py.
>

 oauth2client depends on rsa>=3.14 with no upper limit. rsa 4.1 was
 released today.
 The places that require rsa<4.1 are deeper in the dependency tree. For
 example:

 google-cloud-bigquery==1.24.0
   - google-api-core [required: >=1.15.0,<2.0dev, installed: 1.20.0]
 - google-auth [required: >=1.14.0,<2.0dev, installed: 1.16.1]
   - rsa [required: >=3.1.4,<4.1, installed: 4.1]


>
>>
>>> On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri  wrote:
>>>
 Thanks, that helped in an unexpected way. :)
 I should have used the "gcp" extra instead of "cloud" in my pip
 install command above.

 On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> > Any ideas on how to debug where this requirement is coming from?
> You could try installing and calling pipdeptree [1] from a Jenkins
> job, and see if it helps.
>
> [1] https://pypi.org/project/pipdeptree/
> On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri 
> wrote:
>
>> Hi,
>> I'm trying to understand these "pip check" failures:
>>
>> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but 
>> you'll have rsa 4.1 which is incompatible
>>
>>
>>
>> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>>
>> However, when I do
>> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>>
>> locally, the google-auth package is not installed at all.
>> Any ideas on how to debug where this requirement is coming from?
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: python precommit error - google-auth depenedency?

2020-06-10 Thread Kyle Weaver
The fix to google-auth has been merged. Is the plan just to wait until a
new version of google-auth is released and ignore the failing tests until
then? (btw I filed a JIRA for this before I realized it was already being
discussed here: https://issues.apache.org/jira/browse/BEAM-10232)

On Wed, Jun 10, 2020 at 3:21 PM Udi Meiri  wrote:

> Yes you're right, Py2 envs are still using 4.0.
>
> On Wed, Jun 10, 2020 at 3:03 PM Ahmet Altay  wrote:
>
>>
>>
>> On Wed, Jun 10, 2020 at 2:25 PM Udi Meiri  wrote:
>>
>>> 4.1 drops Python 2 support, so I'm not sure if we're ready for that yet.
>>>
>>
>> Wouldn't that work by default? In python 2 oauth2client's rsa>3.14
>> requirement will resolve to latest python2 supporting version of rsa (4.0?)
>>
>>
>>>
>>> On Wed, Jun 10, 2020 at 2:20 PM Ahmet Altay  wrote:
>>>
 Looks like there is an attempt to fix this:
 https://github.com/googleapis/google-auth-library-python/pull/524

 On Wed, Jun 10, 2020 at 2:07 PM Udi Meiri  wrote:

>
>
> On Wed, Jun 10, 2020 at 1:59 PM Ahmet Altay  wrote:
>
>>
>>
>> On Wed, Jun 10, 2020 at 1:29 PM Kenneth Knowles 
>> wrote:
>>
>>> You may be interested in following
>>> https://github.com/pypa/pip/issues/988 if you are not already.
>>>
>>> Kenn
>>>
>>> On Wed, Jun 10, 2020 at 12:17 PM Udi Meiri  wrote:
>>>
 Seems like manually installing rsa==4.0 satisfies deps, but pip
 doesn't do transitive deps well.

 Would it be right to put a direct dependency on rsa<4.1,>=3.1.4 in
 setup.py?

>>>
>> Did you find where the google-auth dependency is coming from? We
>> might try to fix the problem at the source of that dependency instead of
>> adding rsa to beam's setup.py.
>>
>
> oauth2client depends on rsa>=3.14 with no upper limit. rsa 4.1 was
> released today.
> The places that require rsa<4.1 are deeper in the dependency tree. For
> example:
>
> google-cloud-bigquery==1.24.0
>   - google-api-core [required: >=1.15.0,<2.0dev, installed: 1.20.0]
> - google-auth [required: >=1.14.0,<2.0dev, installed: 1.16.1]
>   - rsa [required: >=3.1.4,<4.1, installed: 4.1]
>
>
>>
>>>
 On Wed, Jun 10, 2020 at 11:48 AM Udi Meiri 
 wrote:

> Thanks, that helped in an unexpected way. :)
> I should have used the "gcp" extra instead of "cloud" in my pip
> install command above.
>
> On Wed, Jun 10, 2020 at 11:37 AM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> > Any ideas on how to debug where this requirement is coming from?
>> You could try installing and calling pipdeptree [1] from a
>> Jenkins job, and see if it helps.
>>
>> [1] https://pypi.org/project/pipdeptree/
>> On Wed, Jun 10, 2020 at 11:00 AM Udi Meiri 
>> wrote:
>>
>>> Hi,
>>> I'm trying to understand these "pip check" failures:
>>>
>>> ERROR: google-auth 1.16.1 has requirement rsa<4.1,>=3.1.4, but 
>>> you'll have rsa 4.1 which is incompatible
>>>
>>>
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2860/console
>>>
>>> However, when I do
>>> pip install dist/apache-beam-2.23.0.dev0.tar.gz[test,cloud]
>>>
>>> locally, the google-auth package is not installed at all.
>>> Any ideas on how to debug where this requirement is coming from?
>>>
>>