Re: Next LTS?

2020-03-24 Thread Robert Bradshaw
I would want to avoid maintain a Python 2 LTS, even if just for the
fact that the infrastructure might not be there.

On Tue, Mar 24, 2020 at 3:58 PM Valentyn Tymofieiev  wrote:
>
> Yes, we had a suggestion to pick a stable Python 2 release as an LTS. The 
> suggestion assumed that LTS will continue to exist. Now, if Python 2 is the 
> only reason to have an LTS, we can consider it as long as:
> - we scope the LTS portion to Python SDK only.
> - we have an ownership story for Python 2 LTS, for example volunteers in dev 
> or user community who will be willing to maintain that release.
>
> We can bring this up when we drop Python 2 support. We decided to revisit 
> that conversation in a couple of months IIRC.
>
> On Tue, Mar 24, 2020 at 3:44 PM Ahmet Altay  wrote:
>>
>> Removing it makes sense. We did not have a good way of measuring the demand 
>> for LTS releases.
>>
>> There was a suggestion to mark the last release with python 2 support to be 
>> an LTS release, was there a conclusion on that? ( +Valentyn Tymofieiev )
>>
>> Ahmet
>>
>> On Tue, Mar 24, 2020 at 2:34 PM Robert Bradshaw  wrote:
>>>
>>> There seems to have been lack of demand. I agree we should remove
>>> these statements from our site until we find a reason to re-visit
>>> doing LTS release.
>>>
>>> On Tue, Mar 24, 2020 at 2:23 PM Austin Bennett
>>>  wrote:
>>> >
>>> > What's our LTS policy these days?  It seems we should remove the 
>>> > following from our site (and encourage GCP does the same, below), if 
>>> > we're not going to maintain these.  I'll update policy page via PR, if 
>>> > get the go ahead that it is our desire.  Seems we can't suggest policies 
>>> > in a policy doc that we don't follow...?
>>> >
>>> > I am not trying to suggest demand for LTS.  If others haven't spoken up, 
>>> > that also indicates lack of demand.  Point of my message is to say, we 
>>> > should update our Policies doc, if those aren't what we are practicing 
>>> > (and can re-add later if wanting to revive LTS).
>>> >
>>> > https://beam.apache.org/community/policies/
>>> >
>>> > Apache Beam aims to make 8 releases in a 12 month period. To accommodate 
>>> > users with longer upgrade cycles, some of these releases will be tagged 
>>> > as long term support (LTS) releases. LTS releases receive patches to fix 
>>> > major issues for 12 months, starting from the release’s initial release 
>>> > date. There will be at least one new LTS release in a 12 month period, 
>>> > and LTS releases are considered deprecated after 12 months. The community 
>>> > will mark a release as a LTS release based on various factors, such as 
>>> > the number of LTS releases currently in flight and whether the 
>>> > accumulated feature set since the last LTS provides significant upgrade 
>>> > value. Non-LTS releases do not receive patches and are considered 
>>> > deprecated immediately after the next following minor release. We 
>>> > encourage you to update early and often; do not wait until the 
>>> > deprecation date of the version you are using.
>>> >
>>> >
>>> >
>>> >
>>> > Seems a Google Specific Concern, but related to the community:  
>>> > https://cloud.google.com/dataflow/docs/support/sdk-version-support-status#apache-beam-sdks-2x
>>> >
>>> > Apache Beam is an open source, community-led project. Google is part of 
>>> > the community, but we do not own the project or control the release 
>>> > process. We might open bugs or submit patches to the Apache Beam codebase 
>>> > on behalf of Dataflow customers, but we cannot create hotfixes or 
>>> > official releases of Apache Beam on demand.
>>> >
>>> > However, the Apache Beam community designates specific releases as long 
>>> > term support (LTS) releases. LTS releases receive patches to fix major 
>>> > issues for a designated period of time. See the Apache Beam policies page 
>>> > for more details about release policies.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Sep 19, 2019 at 5:01 PM Ahmet Altay  wrote:
>>> >>
>>> >> I agree with retiring 2.7 as the LTS family. Based on my experience with 
>>> >> users 2.7 does not have a particularly high adoption and as pointed out 
>>> >> has known critical issues. Declaring another LTS pending demand sounds 
>>> >> reasonable but how are we going to gauge this demand?
>>> >>
>>> >> +Yifan Zou +Alan Myrvold on the tooling question as well. Unless we 
>>> >> address the tooling problem it seems difficult to feasibly maintain LTS 
>>> >> versions over time.
>>> >>
>>> >> On Thu, Sep 19, 2019 at 3:45 PM Austin Bennett 
>>> >>  wrote:
>>> >>>
>>> >>> To be clear, I was picking on - or reminding us of - the promise: I 
>>> >>> don't have a strong personal need/desire (at least currently) for LTS 
>>> >>> to exist.  Though, worth ensuring we live up to what we keep on the 
>>> >>> website.  And, without an active LTS, probably something we should take 
>>> >>> off the site?
>>> >>>
>>> >>> On Thu, Sep 19, 2019 at 1:33 PM Pablo Estrada  
>>> >>> wrote:

Re: Jenkins jobs not running for my PR 10438

2020-03-24 Thread Ahmet Altay
Done.

On Tue, Mar 24, 2020 at 4:32 PM Shoaib Zafar 
wrote:

> Hi Beam Committers!
>
> I appreciate if someone could trigger the tests on my PR.
> https://github.com/apache/beam/pull/11210
>
> It's currently WIP and I need to verify that the code passes all the tests.
>
> Thanks in advance!
>
> Best.
>
> *Shoaib Zafar*
> Software Engineering Lead
> Mobile: +92 333 274 6242
> Skype: live:shoaibzafar_1
>
> 
>
>
> On Wed, Mar 25, 2020 at 3:37 AM Ahmet Altay  wrote:
>
>> KInd of done. 11208 is running the tests. 11156 did not start all the
>> tests. Someone could retry that again in a bit.
>>
>> On Tue, Mar 24, 2020 at 12:11 PM Tomo Suzuki  wrote:
>>
>>> Hi Beam committers,
>>>
>>> Would you trigger precommit checks for
>>> https://github.com/apache/beam/pull/11156 and
>>> https://github.com/apache/beam/pull/11208 with the following 6 checks?
>>> Run Java PostCommit
>>> Run Java HadoopFormatIO Performance Test
>>> Run BigQueryIO Streaming Performance Test Java
>>> Run Dataflow ValidatesRunner
>>> Run Spark ValidatesRunner
>>> Run SQL Postcommit
>>>
>>>
>>> On Fri, Mar 20, 2020 at 11:13 AM Jan Lukavský  wrote:
>>>
 Hi, done.
 On 3/20/20 3:59 PM, Tomo Suzuki wrote:

 HI Beam committers,
 (Thanks Ahmet)

 Would you re-run the presubmit checks for
 https://github.com/apache/beam/pull/11168 with the followings?
 Run Java PostCommit
 Run Java HadoopFormatIO Performance Test
 Run BigQueryIO Streaming Performance Test Java
 Run Dataflow ValidatesRunner
 Run Spark ValidatesRunner
 Run SQL Postcommit


 On Wed, Mar 18, 2020 at 9:09 PM Ahmet Altay  wrote:

> Done.
>
> On Wed, Mar 18, 2020 at 5:57 PM Tomo Suzuki 
> wrote:
>
>> Hi Beam committers,
>> (Alexey, thank you!)
>>
>> 1. Would you run the 2 failed checks Java PreCommit and Python
>> PreCommit for https://github.com/apache/beam/pull/11156
>>
>> 2. Would you run the precommit checks for
>> https://github.com/apache/beam/pull/11168 with the following 6
>> commands?
>> Run Java PostCommit
>> Run Java HadoopFormatIO Performance Test
>> Run BigQueryIO Streaming Performance Test Java
>> Run Dataflow ValidatesRunner
>> Run Spark ValidatesRunner
>> Run SQL Postcommit
>>
>>
>>
>>
>>
>> On Wed, Mar 18, 2020 at 1:08 PM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Done
>>>
>>> On 18 Mar 2020, at 15:25, Tomo Suzuki  wrote:
>>>
>>> Hi Beam committers,
>>>
>>> Would you trigger the precommit checks for
>>> https://github.com/apache/beam/pull/11156 with the following 6
>>> commands ?
>>>
>>> Run Java PostCommit
>>> Run Java HadoopFormatIO Performance Test
>>> Run BigQueryIO Streaming Performance Test Java
>>> Run Dataflow ValidatesRunner
>>> Run Spark ValidatesRunner
>>> Run SQL Postcommit
>>>
>>> Regards,
>>> Tomo
>>>
>>>
>>>
>>
>> --
>> Regards,
>> Tomo
>>
>

 --
 Regards,
 Tomo


>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>


Re: Jenkins jobs not running for my PR 10438

2020-03-24 Thread Shoaib Zafar
Hi Beam Committers!

I appreciate if someone could trigger the tests on my PR.
https://github.com/apache/beam/pull/11210

It's currently WIP and I need to verify that the code passes all the tests.

Thanks in advance!

Best.

*Shoaib Zafar*
Software Engineering Lead
Mobile: +92 333 274 6242
Skype: live:shoaibzafar_1




On Wed, Mar 25, 2020 at 3:37 AM Ahmet Altay  wrote:

> KInd of done. 11208 is running the tests. 11156 did not start all the
> tests. Someone could retry that again in a bit.
>
> On Tue, Mar 24, 2020 at 12:11 PM Tomo Suzuki  wrote:
>
>> Hi Beam committers,
>>
>> Would you trigger precommit checks for
>> https://github.com/apache/beam/pull/11156 and
>> https://github.com/apache/beam/pull/11208 with the following 6 checks?
>> Run Java PostCommit
>> Run Java HadoopFormatIO Performance Test
>> Run BigQueryIO Streaming Performance Test Java
>> Run Dataflow ValidatesRunner
>> Run Spark ValidatesRunner
>> Run SQL Postcommit
>>
>>
>> On Fri, Mar 20, 2020 at 11:13 AM Jan Lukavský  wrote:
>>
>>> Hi, done.
>>> On 3/20/20 3:59 PM, Tomo Suzuki wrote:
>>>
>>> HI Beam committers,
>>> (Thanks Ahmet)
>>>
>>> Would you re-run the presubmit checks for
>>> https://github.com/apache/beam/pull/11168 with the followings?
>>> Run Java PostCommit
>>> Run Java HadoopFormatIO Performance Test
>>> Run BigQueryIO Streaming Performance Test Java
>>> Run Dataflow ValidatesRunner
>>> Run Spark ValidatesRunner
>>> Run SQL Postcommit
>>>
>>>
>>> On Wed, Mar 18, 2020 at 9:09 PM Ahmet Altay  wrote:
>>>
 Done.

 On Wed, Mar 18, 2020 at 5:57 PM Tomo Suzuki  wrote:

> Hi Beam committers,
> (Alexey, thank you!)
>
> 1. Would you run the 2 failed checks Java PreCommit and Python
> PreCommit for https://github.com/apache/beam/pull/11156
>
> 2. Would you run the precommit checks for
> https://github.com/apache/beam/pull/11168 with the following 6
> commands?
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
>
>
>
>
> On Wed, Mar 18, 2020 at 1:08 PM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> Done
>>
>> On 18 Mar 2020, at 15:25, Tomo Suzuki  wrote:
>>
>> Hi Beam committers,
>>
>> Would you trigger the precommit checks for
>> https://github.com/apache/beam/pull/11156 with the following 6
>> commands ?
>>
>> Run Java PostCommit
>> Run Java HadoopFormatIO Performance Test
>> Run BigQueryIO Streaming Performance Test Java
>> Run Dataflow ValidatesRunner
>> Run Spark ValidatesRunner
>> Run SQL Postcommit
>>
>> Regards,
>> Tomo
>>
>>
>>
>
> --
> Regards,
> Tomo
>

>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>>
>>
>> --
>> Regards,
>> Tomo
>>
>


Re: Next LTS?

2020-03-24 Thread Valentyn Tymofieiev
Yes, we had a suggestion to pick a stable Python 2 release as an LTS. The
suggestion assumed that LTS will continue to exist. Now, if Python 2 is
the only reason to have an LTS, we can consider it as long as:
- we scope the LTS portion to Python SDK only.
- we have an ownership story for Python 2 LTS, for example volunteers in
dev or user community who will be willing to maintain that release.

We can bring this up when we drop Python 2 support. We decided to revisit
that conversation in a couple of months IIRC.

On Tue, Mar 24, 2020 at 3:44 PM Ahmet Altay  wrote:

> Removing it makes sense. We did not have a good way of measuring the
> demand for LTS releases.
>
> There was a suggestion to mark the last release with python 2 support to
> be an LTS release, was there a conclusion on that? ( +Valentyn Tymofieiev
>  )
>
> Ahmet
>
> On Tue, Mar 24, 2020 at 2:34 PM Robert Bradshaw 
> wrote:
>
>> There seems to have been lack of demand. I agree we should remove
>> these statements from our site until we find a reason to re-visit
>> doing LTS release.
>>
>> On Tue, Mar 24, 2020 at 2:23 PM Austin Bennett
>>  wrote:
>> >
>> > What's our LTS policy these days?  It seems we should remove the
>> following from our site (and encourage GCP does the same, below), if we're
>> not going to maintain these.  I'll update policy page via PR, if get the go
>> ahead that it is our desire.  Seems we can't suggest policies in a policy
>> doc that we don't follow...?
>> >
>> > I am not trying to suggest demand for LTS.  If others haven't spoken
>> up, that also indicates lack of demand.  Point of my message is to say, we
>> should update our Policies doc, if those aren't what we are practicing (and
>> can re-add later if wanting to revive LTS).
>> >
>> > https://beam.apache.org/community/policies/
>> >
>> > Apache Beam aims to make 8 releases in a 12 month period. To
>> accommodate users with longer upgrade cycles, some of these releases will
>> be tagged as long term support (LTS) releases. LTS releases receive patches
>> to fix major issues for 12 months, starting from the release’s initial
>> release date. There will be at least one new LTS release in a 12 month
>> period, and LTS releases are considered deprecated after 12 months. The
>> community will mark a release as a LTS release based on various factors,
>> such as the number of LTS releases currently in flight and whether the
>> accumulated feature set since the last LTS provides significant upgrade
>> value. Non-LTS releases do not receive patches and are considered
>> deprecated immediately after the next following minor release. We encourage
>> you to update early and often; do not wait until the deprecation date of
>> the version you are using.
>> >
>> >
>> >
>> >
>> > Seems a Google Specific Concern, but related to the community:
>> https://cloud.google.com/dataflow/docs/support/sdk-version-support-status#apache-beam-sdks-2x
>> >
>> > Apache Beam is an open source, community-led project. Google is part of
>> the community, but we do not own the project or control the release
>> process. We might open bugs or submit patches to the Apache Beam codebase
>> on behalf of Dataflow customers, but we cannot create hotfixes or official
>> releases of Apache Beam on demand.
>> >
>> > However, the Apache Beam community designates specific releases as long
>> term support (LTS) releases. LTS releases receive patches to fix major
>> issues for a designated period of time. See the Apache Beam policies page
>> for more details about release policies.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Sep 19, 2019 at 5:01 PM Ahmet Altay  wrote:
>> >>
>> >> I agree with retiring 2.7 as the LTS family. Based on my experience
>> with users 2.7 does not have a particularly high adoption and as pointed
>> out has known critical issues. Declaring another LTS pending demand sounds
>> reasonable but how are we going to gauge this demand?
>> >>
>> >> +Yifan Zou +Alan Myrvold on the tooling question as well. Unless we
>> address the tooling problem it seems difficult to feasibly maintain LTS
>> versions over time.
>> >>
>> >> On Thu, Sep 19, 2019 at 3:45 PM Austin Bennett <
>> whatwouldausti...@gmail.com> wrote:
>> >>>
>> >>> To be clear, I was picking on - or reminding us of - the promise: I
>> don't have a strong personal need/desire (at least currently) for LTS to
>> exist.  Though, worth ensuring we live up to what we keep on the website.
>> And, without an active LTS, probably something we should take off the site?
>> >>>
>> >>> On Thu, Sep 19, 2019 at 1:33 PM Pablo Estrada 
>> wrote:
>> 
>>  +Łukasz Gajowy had at some point thought of setting up jenkins jobs
>> without coupling them to the state of the repo during the last Seed Job. It
>> may be that that improvement can help test older LTS-type releases?
>> 
>>  On Thu, Sep 19, 2019 at 1:11 PM Robert Bradshaw 
>> wrote:
>> >
>> > In many ways the 2.7 LTS was trying to flesh out the process. I
>> think

Re: Next LTS?

2020-03-24 Thread Ahmet Altay
Removing it makes sense. We did not have a good way of measuring the demand
for LTS releases.

There was a suggestion to mark the last release with python 2 support to be
an LTS release, was there a conclusion on that? ( +Valentyn Tymofieiev
 )

Ahmet

On Tue, Mar 24, 2020 at 2:34 PM Robert Bradshaw  wrote:

> There seems to have been lack of demand. I agree we should remove
> these statements from our site until we find a reason to re-visit
> doing LTS release.
>
> On Tue, Mar 24, 2020 at 2:23 PM Austin Bennett
>  wrote:
> >
> > What's our LTS policy these days?  It seems we should remove the
> following from our site (and encourage GCP does the same, below), if we're
> not going to maintain these.  I'll update policy page via PR, if get the go
> ahead that it is our desire.  Seems we can't suggest policies in a policy
> doc that we don't follow...?
> >
> > I am not trying to suggest demand for LTS.  If others haven't spoken up,
> that also indicates lack of demand.  Point of my message is to say, we
> should update our Policies doc, if those aren't what we are practicing (and
> can re-add later if wanting to revive LTS).
> >
> > https://beam.apache.org/community/policies/
> >
> > Apache Beam aims to make 8 releases in a 12 month period. To accommodate
> users with longer upgrade cycles, some of these releases will be tagged as
> long term support (LTS) releases. LTS releases receive patches to fix major
> issues for 12 months, starting from the release’s initial release date.
> There will be at least one new LTS release in a 12 month period, and LTS
> releases are considered deprecated after 12 months. The community will mark
> a release as a LTS release based on various factors, such as the number of
> LTS releases currently in flight and whether the accumulated feature set
> since the last LTS provides significant upgrade value. Non-LTS releases do
> not receive patches and are considered deprecated immediately after the
> next following minor release. We encourage you to update early and often;
> do not wait until the deprecation date of the version you are using.
> >
> >
> >
> >
> > Seems a Google Specific Concern, but related to the community:
> https://cloud.google.com/dataflow/docs/support/sdk-version-support-status#apache-beam-sdks-2x
> >
> > Apache Beam is an open source, community-led project. Google is part of
> the community, but we do not own the project or control the release
> process. We might open bugs or submit patches to the Apache Beam codebase
> on behalf of Dataflow customers, but we cannot create hotfixes or official
> releases of Apache Beam on demand.
> >
> > However, the Apache Beam community designates specific releases as long
> term support (LTS) releases. LTS releases receive patches to fix major
> issues for a designated period of time. See the Apache Beam policies page
> for more details about release policies.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Sep 19, 2019 at 5:01 PM Ahmet Altay  wrote:
> >>
> >> I agree with retiring 2.7 as the LTS family. Based on my experience
> with users 2.7 does not have a particularly high adoption and as pointed
> out has known critical issues. Declaring another LTS pending demand sounds
> reasonable but how are we going to gauge this demand?
> >>
> >> +Yifan Zou +Alan Myrvold on the tooling question as well. Unless we
> address the tooling problem it seems difficult to feasibly maintain LTS
> versions over time.
> >>
> >> On Thu, Sep 19, 2019 at 3:45 PM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
> >>>
> >>> To be clear, I was picking on - or reminding us of - the promise: I
> don't have a strong personal need/desire (at least currently) for LTS to
> exist.  Though, worth ensuring we live up to what we keep on the website.
> And, without an active LTS, probably something we should take off the site?
> >>>
> >>> On Thu, Sep 19, 2019 at 1:33 PM Pablo Estrada 
> wrote:
> 
>  +Łukasz Gajowy had at some point thought of setting up jenkins jobs
> without coupling them to the state of the repo during the last Seed Job. It
> may be that that improvement can help test older LTS-type releases?
> 
>  On Thu, Sep 19, 2019 at 1:11 PM Robert Bradshaw 
> wrote:
> >
> > In many ways the 2.7 LTS was trying to flesh out the process. I think
> > we learned some valuable lessons. It would have been good to push out
> > something (even if it didn't have everything we wanted) but that is
> > unlikely to be worth pursuing now (and 2.7 should probably be retired
> > as LTS and no longer recommended).
> >
> > I agree that it does not seem there is strong demand for an LTS at
> > this point. I would propose that we keep 2.16, etc. as potential
> > candidates, but only declare one as LTS pending demand. The question
> > of how to keep our tooling stable (or backwards/forwards compatible)
> > is a good one, especially as we move to drop Python 2.7 in 2020
> (which
> > could itself 

Re: [PROPOSAL] Add licenses and notices to SDK docker images

2020-03-24 Thread Robert Bradshaw
Thank you for updating the doc. As I mentioned on the PR, I do not
think we should check all 100K lines of auto-generated/pulled licence
files into the repository and run separate asynchronous processes to
try to keep things in sync and fix things up as dependencies evolve.
Instead, we should populate the container licenses with what's
actually in the container at container build time as part of the
container build process.

On Tue, Mar 24, 2020 at 2:10 PM Hannah Jiang  wrote:
>
> Hi Team
>
> I added some more contents to the documents to discuss how to manage new 
> dependencies and licenses.
>
> Main ideas are
> 1. Run precommit tests for PRs to check if new dependencies and license are 
> added or removed.
> 2. Run daily checks to see if license and notice context are updated and sent 
> PRs automatically to update licenses and notices.
>
> Please review When run dependency check section and provide inputs to improve 
> the process.
> Link: https://s.apache.org/eauq6
>
> Hannah
>
>
>
> On Wed, Feb 5, 2020 at 4:43 PM Hannah Jiang  wrote:
>>
>> Hello
>>
>> I wrote a design document about adding licenses and notices for third party 
>> dependencies to SDK docker images.
>> I reviewed several tools for this purpose, please recommend other tools if 
>> anything in your mind, I am happy to review those as well.
>> Link: https://s.apache.org/eauq6
>>
>> Any kind of comments are welcome.
>>
>> Thanks,
>> Hannah
>>
>>


Re: Jenkins jobs not running for my PR 10438

2020-03-24 Thread Ahmet Altay
KInd of done. 11208 is running the tests. 11156 did not start all the
tests. Someone could retry that again in a bit.

On Tue, Mar 24, 2020 at 12:11 PM Tomo Suzuki  wrote:

> Hi Beam committers,
>
> Would you trigger precommit checks for
> https://github.com/apache/beam/pull/11156 and
> https://github.com/apache/beam/pull/11208 with the following 6 checks?
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
>
> On Fri, Mar 20, 2020 at 11:13 AM Jan Lukavský  wrote:
>
>> Hi, done.
>> On 3/20/20 3:59 PM, Tomo Suzuki wrote:
>>
>> HI Beam committers,
>> (Thanks Ahmet)
>>
>> Would you re-run the presubmit checks for
>> https://github.com/apache/beam/pull/11168 with the followings?
>> Run Java PostCommit
>> Run Java HadoopFormatIO Performance Test
>> Run BigQueryIO Streaming Performance Test Java
>> Run Dataflow ValidatesRunner
>> Run Spark ValidatesRunner
>> Run SQL Postcommit
>>
>>
>> On Wed, Mar 18, 2020 at 9:09 PM Ahmet Altay  wrote:
>>
>>> Done.
>>>
>>> On Wed, Mar 18, 2020 at 5:57 PM Tomo Suzuki  wrote:
>>>
 Hi Beam committers,
 (Alexey, thank you!)

 1. Would you run the 2 failed checks Java PreCommit and Python
 PreCommit for https://github.com/apache/beam/pull/11156

 2. Would you run the precommit checks for
 https://github.com/apache/beam/pull/11168 with the following 6
 commands?
 Run Java PostCommit
 Run Java HadoopFormatIO Performance Test
 Run BigQueryIO Streaming Performance Test Java
 Run Dataflow ValidatesRunner
 Run Spark ValidatesRunner
 Run SQL Postcommit





 On Wed, Mar 18, 2020 at 1:08 PM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Done
>
> On 18 Mar 2020, at 15:25, Tomo Suzuki  wrote:
>
> Hi Beam committers,
>
> Would you trigger the precommit checks for
> https://github.com/apache/beam/pull/11156 with the following 6
> commands ?
>
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
> Regards,
> Tomo
>
>
>

 --
 Regards,
 Tomo

>>>
>>
>> --
>> Regards,
>> Tomo
>>
>>
>
> --
> Regards,
> Tomo
>


Re: Next LTS?

2020-03-24 Thread Robert Bradshaw
There seems to have been lack of demand. I agree we should remove
these statements from our site until we find a reason to re-visit
doing LTS release.

On Tue, Mar 24, 2020 at 2:23 PM Austin Bennett
 wrote:
>
> What's our LTS policy these days?  It seems we should remove the following 
> from our site (and encourage GCP does the same, below), if we're not going to 
> maintain these.  I'll update policy page via PR, if get the go ahead that it 
> is our desire.  Seems we can't suggest policies in a policy doc that we don't 
> follow...?
>
> I am not trying to suggest demand for LTS.  If others haven't spoken up, that 
> also indicates lack of demand.  Point of my message is to say, we should 
> update our Policies doc, if those aren't what we are practicing (and can 
> re-add later if wanting to revive LTS).
>
> https://beam.apache.org/community/policies/
>
> Apache Beam aims to make 8 releases in a 12 month period. To accommodate 
> users with longer upgrade cycles, some of these releases will be tagged as 
> long term support (LTS) releases. LTS releases receive patches to fix major 
> issues for 12 months, starting from the release’s initial release date. There 
> will be at least one new LTS release in a 12 month period, and LTS releases 
> are considered deprecated after 12 months. The community will mark a release 
> as a LTS release based on various factors, such as the number of LTS releases 
> currently in flight and whether the accumulated feature set since the last 
> LTS provides significant upgrade value. Non-LTS releases do not receive 
> patches and are considered deprecated immediately after the next following 
> minor release. We encourage you to update early and often; do not wait until 
> the deprecation date of the version you are using.
>
>
>
>
> Seems a Google Specific Concern, but related to the community:  
> https://cloud.google.com/dataflow/docs/support/sdk-version-support-status#apache-beam-sdks-2x
>
> Apache Beam is an open source, community-led project. Google is part of the 
> community, but we do not own the project or control the release process. We 
> might open bugs or submit patches to the Apache Beam codebase on behalf of 
> Dataflow customers, but we cannot create hotfixes or official releases of 
> Apache Beam on demand.
>
> However, the Apache Beam community designates specific releases as long term 
> support (LTS) releases. LTS releases receive patches to fix major issues for 
> a designated period of time. See the Apache Beam policies page for more 
> details about release policies.
>
>
>
>
>
>
>
>
>
> On Thu, Sep 19, 2019 at 5:01 PM Ahmet Altay  wrote:
>>
>> I agree with retiring 2.7 as the LTS family. Based on my experience with 
>> users 2.7 does not have a particularly high adoption and as pointed out has 
>> known critical issues. Declaring another LTS pending demand sounds 
>> reasonable but how are we going to gauge this demand?
>>
>> +Yifan Zou +Alan Myrvold on the tooling question as well. Unless we address 
>> the tooling problem it seems difficult to feasibly maintain LTS versions 
>> over time.
>>
>> On Thu, Sep 19, 2019 at 3:45 PM Austin Bennett  
>> wrote:
>>>
>>> To be clear, I was picking on - or reminding us of - the promise: I don't 
>>> have a strong personal need/desire (at least currently) for LTS to exist.  
>>> Though, worth ensuring we live up to what we keep on the website.  And, 
>>> without an active LTS, probably something we should take off the site?
>>>
>>> On Thu, Sep 19, 2019 at 1:33 PM Pablo Estrada  wrote:

 +Łukasz Gajowy had at some point thought of setting up jenkins jobs 
 without coupling them to the state of the repo during the last Seed Job. 
 It may be that that improvement can help test older LTS-type releases?

 On Thu, Sep 19, 2019 at 1:11 PM Robert Bradshaw  
 wrote:
>
> In many ways the 2.7 LTS was trying to flesh out the process. I think
> we learned some valuable lessons. It would have been good to push out
> something (even if it didn't have everything we wanted) but that is
> unlikely to be worth pursuing now (and 2.7 should probably be retired
> as LTS and no longer recommended).
>
> I agree that it does not seem there is strong demand for an LTS at
> this point. I would propose that we keep 2.16, etc. as potential
> candidates, but only declare one as LTS pending demand. The question
> of how to keep our tooling stable (or backwards/forwards compatible)
> is a good one, especially as we move to drop Python 2.7 in 2020 (which
> could itself be a driver for an LTS).
>
> On Thu, Sep 19, 2019 at 12:27 PM Kenneth Knowles  wrote:
> >
> > Yes, I pretty much dropped 2.7.1 release process due to lack of 
> > interest.
> >
> > There are known problems so that I cannot recommend anyone to use 
> > 2.7.0, yet 2.7 it is the current LTS family. So my work on 2.7.1 was 
> > philosophical. I did not like the fact 

Re: Next LTS?

2020-03-24 Thread Austin Bennett
What's our LTS policy these days?  It seems we should remove the following
from our site (and encourage GCP does the same, below), if we're not going
to maintain these.  I'll update policy page via PR, if get the go ahead
that it is our desire.  Seems we can't suggest policies in a policy doc
that we don't follow...?

I am not trying to suggest demand for LTS.  If others haven't spoken up,
that also indicates lack of demand.  Point of my message is to say, we
should update our Policies doc, if those aren't what we are practicing (and
can re-add later if wanting to revive LTS).

https://beam.apache.org/community/policies/

Apache Beam aims to make 8 releases in a 12 month period. To accommodate
users with longer upgrade cycles, some of these releases will be tagged as
long term support (LTS) releases. LTS releases receive patches to fix major
issues for 12 months, starting from the release’s initial release date.
There will be at least one new LTS release in a 12 month period, and LTS
releases are considered deprecated after 12 months. The community will mark
a release as a LTS release based on various factors, such as the number of
LTS releases currently in flight and whether the accumulated feature set
since the last LTS provides significant upgrade value. Non-LTS releases do
not receive patches and are considered deprecated immediately after the
next following minor release. We encourage you to update early and often;
do not wait until the deprecation date of the version you are using.



Seems a Google Specific Concern, but related to the community:
https://cloud.google.com/dataflow/docs/support/sdk-version-support-status#apache-beam-sdks-2x

Apache Beam  is an open source, community-led
project. Google is part of the community, but we do not own the project or
control the release process. We might open bugs or submit patches to the
Apache Beam codebase on behalf of Dataflow customers, but we cannot create
hotfixes or official releases of Apache Beam on demand.

However, the Apache Beam community designates specific releases as *long
term support (LTS)* releases. LTS releases receive patches to fix major
issues for a designated period of time. See the Apache Beam policies
 page for more details about
release policies.









On Thu, Sep 19, 2019 at 5:01 PM Ahmet Altay  wrote:

> I agree with retiring 2.7 as the LTS family. Based on my experience with
> users 2.7 does not have a particularly high adoption and as pointed out has
> known critical issues. Declaring another LTS pending demand sounds
> reasonable but how are we going to gauge this demand?
>
> +Yifan Zou  +Alan Myrvold  on
> the tooling question as well. Unless we address the tooling problem it
> seems difficult to feasibly maintain LTS versions over time.
>
> On Thu, Sep 19, 2019 at 3:45 PM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
>
>> To be clear, I was picking on - or reminding us of - the promise: I don't
>> have a strong personal need/desire (at least currently) for LTS to exist.
>> Though, worth ensuring we live up to what we keep on the website.  And,
>> without an active LTS, probably something we should take off the site?
>>
>> On Thu, Sep 19, 2019 at 1:33 PM Pablo Estrada  wrote:
>>
>>> +Łukasz Gajowy  had at some point thought of
>>> setting up jenkins jobs without coupling them to the state of the repo
>>> during the last Seed Job. It may be that that improvement can help test
>>> older LTS-type releases?
>>>
>>> On Thu, Sep 19, 2019 at 1:11 PM Robert Bradshaw 
>>> wrote:
>>>
 In many ways the 2.7 LTS was trying to flesh out the process. I think
 we learned some valuable lessons. It would have been good to push out
 something (even if it didn't have everything we wanted) but that is
 unlikely to be worth pursuing now (and 2.7 should probably be retired
 as LTS and no longer recommended).

 I agree that it does not seem there is strong demand for an LTS at
 this point. I would propose that we keep 2.16, etc. as potential
 candidates, but only declare one as LTS pending demand. The question
 of how to keep our tooling stable (or backwards/forwards compatible)
 is a good one, especially as we move to drop Python 2.7 in 2020 (which
 could itself be a driver for an LTS).

 On Thu, Sep 19, 2019 at 12:27 PM Kenneth Knowles 
 wrote:
 >
 > Yes, I pretty much dropped 2.7.1 release process due to lack of
 interest.
 >
 > There are known problems so that I cannot recommend anyone to use
 2.7.0, yet 2.7 it is the current LTS family. So my work on 2.7.1 was
 philosophical. I did not like the fact that we had a designated LTS family
 with no usable releases.
 >
 > But many backports were proposed to block 2.7.1 and took a very long
 time to get contirbutors to implement the backports. I ended up doing many
 of them just to move it along. This indicates a 

Re: [PROPOSAL] Add licenses and notices to SDK docker images

2020-03-24 Thread Hannah Jiang
Hi Team

I added some more contents to the documents to discuss how to manage new
dependencies and licenses.

Main ideas are
1. Run precommit tests for PRs to check if new dependencies and license are
added or removed.
2. Run daily checks to see if license and notice context are updated and
sent PRs automatically to update licenses and notices.

Please review *When run dependency check* section and provide inputs to
improve the process.
Link: https://s.apache.org/eauq6

Hannah



On Wed, Feb 5, 2020 at 4:43 PM Hannah Jiang  wrote:

> Hello
>
> I wrote a design document about adding licenses and notices for third
> party dependencies to SDK docker images.
> I reviewed several tools for this purpose, please recommend other tools if
> anything in your mind, I am happy to review those as well.
> Link: https://s.apache.org/eauq6
>
> Any kind of comments are welcome.
>
> Thanks,
> Hannah
>
>
>


[GSoC 2020 Proposal] BEAM-6807: Implement an Azure blobstore filesystem for Python SDK

2020-03-24 Thread Badrul Chowdhury
Hi All,

I would love to hear your thoughts on my proposal for adding Python SDK
support for Azure Blob Store I/O:
https://docs.google.com/document/d/173e_gnDclwavqobiNjwxRlo9D1xjaZat98g6Yax0kGQ/edit?usp=sharing

Stay safe!

Thanks,
Badrul


Re: Are docker image tags shared within a jenkins worker?

2020-03-24 Thread Hannah Jiang
This can be done by 1). passing "-Pdocker-tag=xxx" to the test and 2). make
sure to specify the custom tag when using docker images.
For example, *:sdks:python:test-suites:portable:py35:preCommitPy35
-Pdocker-tag=20200324 *will create an image with a tag 20200324.
*--environment_config=path/to/container/image* pipeline option can be used
for Python pipeline to pass custom docker images.



On Tue, Mar 24, 2020 at 11:42 AM Brian Hulette  wrote:

> Failing run:
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink_PR/65/
> Passing run:
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink_PR/66/
>
> On Tue, Mar 24, 2020 at 11:33 AM Hannah Jiang 
> wrote:
>
>> Hi Brian
>>
>> I think that's possible if we use the default tag for the Jenkins tests.
>> To prevent this, we can use a customized tag, for example, timestamp, for
>> each build.
>> Can you please point me to the failing tests? I will check more details.
>>
>> Thanks,
>> Hannah
>>
>>
>> On Tue, Mar 24, 2020 at 10:11 AM Brian Hulette 
>> wrote:
>>
>>> I ran into a test failure on the XVR tests in [1] which looked like the
>>> test was executing with a python docker container that did _not_ include
>>> the python changes in my PR. The test ran successfully after a second run.
>>>
>>> It seems likely that the initial failure occurred because some other job
>>> was running concurrently on the same jenkins worker and overwrote the `
>>> apache/beam_python2.7_sdk:2.21.0.dev` image that my run had generated.
>>> Is this possible? If so, is there something we should do to isolate these
>>> images?
>>>
>>> [1] https://github.com/apache/beam/pull/10055
>>>
>>


[BEAM-9322] Python SDK discussion on correct output tag names

2020-03-24 Thread Sam Rohde
Hi All,

*Problem*
I would like to discuss BEAM-9322
 and the
correct way to set the output tags of a transform with nested PCollections,
e.g. a dict of PCollections, a tuple of dicts of PCollections. Before the
fixing of BEAM-1833 , the
Python SDK when applying a PTransform would auto-generate the output tags
for the output PCollections even if they are manually set by the user:

class MyComposite(beam.PTransform):
  def expand(self, pcoll):
a = PCollection.from_(pcoll)
a.tag = 'a'

b = PCollection.from_(pcoll)
b.tag = 'b'
return (a, b)

would yield a PTransform with two output PCollection and output tags with
'None' and '0' instead of 'a' and 'b'. This was corrected for simple cases
like this. However, this fails when the PCollections share the same output
tag (of course). This can happen like so:

class MyComposite(beam.PTransform):
  def expand(self, pcoll):
partition_1 = beam.Partition(pcoll, ...)
partition_2 = beam.Partition(pcoll, ...)
return (partition_1[0], partition_2[0])

With the new code, this leads to an error because both output PCollections
have an output tag of '0'.

*Proposal*
When applying PTransforms to a pipeline (pipeline.py:550) we name the
PCollections according to their position in the tree concatenated with the
PCollection tag and a delimiter. From the first example, the output
PCollections of the applied transform will be: '0.a' and '1.b' because it
is a tuple of PCollections. In the second example, the outputs should be:
'0.0' and '1.0'. In the case of a dict of PCollections, it should simply be
the keys of the dict.

What do you think? Am I missing edge cases? Will this be unexpected to
users? Will this break people who rely on the generated PCollection output
tags?

Regards,
Sam


Re: Jenkins jobs not running for my PR 10438

2020-03-24 Thread Tomo Suzuki
Hi Beam committers,

Would you trigger precommit checks for
https://github.com/apache/beam/pull/11156 and
https://github.com/apache/beam/pull/11208 with the following 6 checks?
Run Java PostCommit
Run Java HadoopFormatIO Performance Test
Run BigQueryIO Streaming Performance Test Java
Run Dataflow ValidatesRunner
Run Spark ValidatesRunner
Run SQL Postcommit


On Fri, Mar 20, 2020 at 11:13 AM Jan Lukavský  wrote:

> Hi, done.
> On 3/20/20 3:59 PM, Tomo Suzuki wrote:
>
> HI Beam committers,
> (Thanks Ahmet)
>
> Would you re-run the presubmit checks for
> https://github.com/apache/beam/pull/11168 with the followings?
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
>
> On Wed, Mar 18, 2020 at 9:09 PM Ahmet Altay  wrote:
>
>> Done.
>>
>> On Wed, Mar 18, 2020 at 5:57 PM Tomo Suzuki  wrote:
>>
>>> Hi Beam committers,
>>> (Alexey, thank you!)
>>>
>>> 1. Would you run the 2 failed checks Java PreCommit and Python PreCommit
>>> for https://github.com/apache/beam/pull/11156
>>>
>>> 2. Would you run the precommit checks for
>>> https://github.com/apache/beam/pull/11168 with the following 6 commands?
>>> Run Java PostCommit
>>> Run Java HadoopFormatIO Performance Test
>>> Run BigQueryIO Streaming Performance Test Java
>>> Run Dataflow ValidatesRunner
>>> Run Spark ValidatesRunner
>>> Run SQL Postcommit
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 18, 2020 at 1:08 PM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 Done

 On 18 Mar 2020, at 15:25, Tomo Suzuki  wrote:

 Hi Beam committers,

 Would you trigger the precommit checks for
 https://github.com/apache/beam/pull/11156 with the following 6
 commands ?

 Run Java PostCommit
 Run Java HadoopFormatIO Performance Test
 Run BigQueryIO Streaming Performance Test Java
 Run Dataflow ValidatesRunner
 Run Spark ValidatesRunner
 Run SQL Postcommit

 Regards,
 Tomo



>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>
>
> --
> Regards,
> Tomo
>
>

-- 
Regards,
Tomo


Re: Are docker image tags shared within a jenkins worker?

2020-03-24 Thread Brian Hulette
Failing run: https://builds.apache.org/job/beam_PostCommit_XVR_Flink_PR/65/
Passing run: https://builds.apache.org/job/beam_PostCommit_XVR_Flink_PR/66/

On Tue, Mar 24, 2020 at 11:33 AM Hannah Jiang 
wrote:

> Hi Brian
>
> I think that's possible if we use the default tag for the Jenkins tests.
> To prevent this, we can use a customized tag, for example, timestamp, for
> each build.
> Can you please point me to the failing tests? I will check more details.
>
> Thanks,
> Hannah
>
>
> On Tue, Mar 24, 2020 at 10:11 AM Brian Hulette 
> wrote:
>
>> I ran into a test failure on the XVR tests in [1] which looked like the
>> test was executing with a python docker container that did _not_ include
>> the python changes in my PR. The test ran successfully after a second run.
>>
>> It seems likely that the initial failure occurred because some other job
>> was running concurrently on the same jenkins worker and overwrote the `
>> apache/beam_python2.7_sdk:2.21.0.dev` image that my run had generated.
>> Is this possible? If so, is there something we should do to isolate these
>> images?
>>
>> [1] https://github.com/apache/beam/pull/10055
>>
>


Re: Are docker image tags shared within a jenkins worker?

2020-03-24 Thread Hannah Jiang
Hi Brian

I think that's possible if we use the default tag for the Jenkins tests. To
prevent this, we can use a customized tag, for example, timestamp, for each
build.
Can you please point me to the failing tests? I will check more details.

Thanks,
Hannah


On Tue, Mar 24, 2020 at 10:11 AM Brian Hulette  wrote:

> I ran into a test failure on the XVR tests in [1] which looked like the
> test was executing with a python docker container that did _not_ include
> the python changes in my PR. The test ran successfully after a second run.
>
> It seems likely that the initial failure occurred because some other job
> was running concurrently on the same jenkins worker and overwrote the `
> apache/beam_python2.7_sdk:2.21.0.dev` image that my run had generated. Is
> this possible? If so, is there something we should do to isolate these
> images?
>
> [1] https://github.com/apache/beam/pull/10055
>


Are docker image tags shared within a jenkins worker?

2020-03-24 Thread Brian Hulette
I ran into a test failure on the XVR tests in [1] which looked like the
test was executing with a python docker container that did _not_ include
the python changes in my PR. The test ran successfully after a second run.

It seems likely that the initial failure occurred because some other job
was running concurrently on the same jenkins worker and overwrote the `
apache/beam_python2.7_sdk:2.21.0.dev` image that my run had generated. Is
this possible? If so, is there something we should do to isolate these
images?

[1] https://github.com/apache/beam/pull/10055


Beam Dependency Check Report (2020-03-24)

2020-03-24 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
google-cloud-datastore
1.7.4
1.11.0
2019-05-27
2020-03-24BEAM-8443
google-cloud-pubsub
1.0.2
1.4.1
2019-12-23
2020-03-24BEAM-5539
google-cloud-vision
0.42.0
1.0.0
None
2020-03-24BEAM-9581
grpcio-tools
1.14.2
1.27.2
None
2020-03-24BEAM-9582
httplib2
0.12.0
0.17.0
2018-12-10
2020-01-27BEAM-9018
mock
2.0.0
3.0.5
2019-05-20
2019-05-20BEAM-7369
oauth2client
3.0.0
4.1.3
2018-12-10
2018-12-10BEAM-6089
prompt-toolkit
1.0.18
2.0.10
None
2020-03-24BEAM-9583
tenacity
5.1.5
6.1.0
2019-11-11
2020-03-24BEAM-8607
tox
3.11.1
3.14.5
None
2020-03-24BEAM-9584
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.alibaba:fastjson
1.2.49
1.2.67
2018-08-04
2020-03-19BEAM-8632
com.datastax.cassandra:cassandra-driver-core
3.8.0
4.0.0
2019-10-29
2019-03-18BEAM-8674
com.esotericsoftware:kryo
4.0.2
5.0.0-RC5
2018-03-20
2020-03-08BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.20.0
0.28.0
2019-02-11
2020-02-24BEAM-6645
com.github.luben:zstd-jni
1.3.8-3
1.4.4-7
2019-01-29
2020-01-24BEAM-9194
com.github.spotbugs:spotbugs
3.1.12
4.0.1
2019-03-01
2020-03-18BEAM-7792
com.github.spotbugs:spotbugs-annotations
3.1.12
4.0.1
2019-03-01
2020-03-18BEAM-6951
com.google.api.grpc:grpc-google-common-protos
1.12.0
1.17.0
2018-06-29
2019-10-04BEAM-8633
com.google.api.grpc:proto-google-cloud-bigquerystorage-v1beta1
0.85.1
0.91.0
2020-01-08
2020-03-10BEAM-8678
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1
1.49.1
1.52.0
2020-01-28
2020-03-20BEAM-8682
com.google.api.grpc:proto-google-common-protos
1.12.0
1.17.0
2018-06-29
2019-10-04BEAM-6899
com.google.apis:google-api-services-clouddebugger
v2-rev20191003-1.30.3
v2-rev20200313-1.30.9
2019-10-19
2020-03-24BEAM-8750
com.google.apis:google-api-services-cloudresourcemanager
v1-rev20191206-1.30.3
v2-rev20200210-1.30.9
2019-12-17
2020-03-05BEAM-8751
com.google.apis:google-api-services-dataflow
v1b3-rev20190927-1.30.3
v1beta3-rev12-1.20.0
2019-10-11
2015-04-29BEAM-8752
com.google.apis:google-api-services-pubsub
v1-rev2019-1.30.3
v1-rev20200312-1.30.9
2019-11-26
2020-03-24BEAM-8753
com.google.apis:google-api-services-storage
v1-rev20191011-1.30.3
v1-rev20200226-1.30.9
2019-10-30
2020-03-16BEAM-8754
com.google.cloud:google-cloud-spanner
1.49.1
1.52.0
2020-01-28
2020-03-20BEAM-8758
com.google.guava:guava-testlib
25.1-jre
28.2-jre
2018-05-23
2019-12-27BEAM-8760
com.google.protobuf.nano:protobuf-javanano
3.0.0-alpha-5
3.2.0rc2
2016-01-06
2017-01-19BEAM-9098
com.hazelcast:hazelcast
3.12
4.0
2019-04-09
2020-02-04BEAM-8636
com.hazelcast.jet:hazelcast-jet
3.0
4.0
2019-04-11
2020-03-02BEAM-9586
com.hazelcast.jet:hazelcast-jet-core
3.0
4.0
2019-04-11
2020-03-02BEAM-9587
com.ning:compress-lzf
1.0.3
1.0.4
2014-08-16
2017-03-14BEAM-9100
com.pholser:junit-quickcheck-core
0.8
0.9.1
2018-03-08
2020-01-21BEAM-8699
io.netty:netty-handler
4.1.30.Final
5.0.0.Alpha2
2018-09-27
2015-03-03BEAM-8703
javax.servlet:javax.servlet-api
3.1.0
4.0.1
2013-04-25
2018-04-20BEAM-5750
javax.xml.bind:jaxb-api
2.2.12
2.4.0-b180830.0359
  

Re: Thoughts on Covid19

2020-03-24 Thread Jan Lukavský

Hi,

I've put down a concrete proposal [1] for a data collection platform 
that could help not with the current situation, but with what might 
follow. I'd appreciate any feedback from this community, as I think here 
are really many people that can give their insights regarding usability 
of such a concept, data security, feasibility, etc. Note again that 
purpose of this proposal is not to target on current situation, but to 
enable "softer" variants of quarantines after the first wave is dealt 
with (if needed and I'm not implying that).


Many thanks in advance for any comments,

 Jan

[1] 
https://docs.google.com/document/d/1HPRV1SriRd2v95r2I_MYcRkJgiLZs27wPuowigsmO70/edit?usp=sharing


On 3/19/20 9:19 AM, Jan Lukavský wrote:


Cool, thanks for the link. I had in mind a maybe smaller and more 
focused team with a reasonably small and well defined problem to 
solve, but this might be also an option.


On 3/19/20 1:45 AM, Valentyn Tymofieiev wrote:
Saw this  website on HN today, where crowdsoursing of various ideas 
is discussed :


https://helpwithcovid.com/
https://news.ycombinator.com/item?id=22615453

On Wed, Mar 18, 2020 at 4:30 PM Seetharam Venkatesh 
mailto:venkat...@innerzeal.com>> wrote:


Can we not use NextDoor that already connects communities?

On Wed, Mar 18, 2020 at 4:01 PM Alex Amato mailto:ajam...@google.com>> wrote:

Well, you could try scaling it as an App to connect people. A
simple web architecture would be fastest to setup.

But I think a lot of people won't be able to use an app, if
you had a phone number with some operators to collect their
information, then it could be possible to get those users
assistance.
There might be some privacy and security issues too around
taking and publishing people's information. So I am not too
sure how to navigate that.

On Wed, Mar 18, 2020 at 3:48 PM Jan Lukavský mailto:je...@seznam.cz>> wrote:

Hi Alex,

great idea, thanks for that! Can we think of a solution
that would be a little more scalable? Can we (e.g. via a
mobile app) help connect people who need help with people
who might offer help? Can we do this in reasonable time?

On 3/18/20 11:42 PM, Alex Amato wrote:

Here is one thing many people could do:
- Contact your neighbors (leave a note on their door
with your phone number) and find out if anyone is high
risk and does not want to risk leaving their home. If
you are lower risk and willing to go out. Insist that
you can help them and obtain supplies for them. Or help
them order online if they don't know how.
- If there are neighbours who live along, also give them
your phone number. Help keep track of them incase they
get sick.

More technical and farfetched idea:
- Building custom ventilators. In some locations they
are already out of respirators, and they will need more.
You could donate these to a hospital, though I am not
sure if they would use them (but they might be willing
to if there is no other option).
There are a few blogs on how to build these from
supplies available in a crisis. A little bit of DIY
knowhow and it may be possible to build a few. Even a
few low quality ventilators could save some lives.
Though, it may be possible there are more skilled people
or local shops already doing this. Helping them get
supplies and funds is another option.
https://www.instructables.com/id/The-Pandemic-Ventilator/



On Wed, Mar 18, 2020 at 3:27 PM Jan Lukavský
mailto:je...@seznam.cz>> wrote:

Hi,

I'm taking this opportunity to speak to this
"streaming first" and
"datadriven" community to try to do a little
brainstorming. I'm not
trying to create any panic, I'd like to start a
serious discussion about
solving a problem. I'm well aware this is not the
primary use-case for
this mailing list, but we are in a sort of special
situation. I think we
might share a know-how that might help people and so
we could take
advantage of that. Currently, the biggest concern
(at least in Europe)
seems to be separating people as much as possible.
My questions would be:

  - Can we try to think of ways to help people
achieve better
separation? There are places people must go to (e.g.
shopping food), can
we help planning this so that there are 

Re: FlinkRequiresStableInput test is very flaky

2020-03-24 Thread Ismaël Mejía
Just saw that Luke created
https://issues.apache.org/jira/browse/BEAM-9578 so good to go on that
one.

On Tue, Mar 24, 2020 at 10:12 AM Ismaël Mejía  wrote:
>
> Oh that looks like a considerable bug and could explain the not so
> good performance of the Portable VR tests (if this is happening for
> every call as you mention). Looks like more worth of a JIRA even than
> the FlinkRequiresStableInputTest.
>
>
>
> On Mon, Mar 23, 2020 at 7:42 PM Reuven Lax  wrote:
> >
> > It looks like every call to getParDoPayload tries to zip up the entire 
> > staging directory. This is pretty bad, as we call getParDoPayload in many 
> > places, and most of those places don't expect it to have such a side effect.
> >
> > On Mon, Mar 23, 2020 at 11:38 AM Reuven Lax  wrote:
> >>
> >> I can file a JIRA, but does anyone know why this test is so flaky? Do we 
> >> simply need to give it longer to run?
> >>
> >> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1912/testReport/junit/org.apache.beam.runners.flink/FlinkRequiresStableInputTest/testParDoRequiresStableInput/


Re: Hello Beam Community!

2020-03-24 Thread Karolina Rosół
Welcome Brittany :-)

Karolina Rosół
Polidea  | Project Manager

M: +48 606 630 236 <+48606630236>
E: karolina.ro...@polidea.com
[image: Polidea] 

Check out our projects! 
[image: Github]  [image: Facebook]
 [image: Twitter]
 [image: Linkedin]
 [image: Instagram]
 [image: Behance]
 [image: dribbble]



On Sat, Mar 14, 2020 at 1:43 AM Reza Rokni  wrote:

> Welcome!
>
> On Sat, 14 Mar 2020, 01:27 Tomo Suzuki,  wrote:
>
>> Welcome.
>>
>> On Fri, Mar 13, 2020 at 1:20 PM Udi Meiri  wrote:
>>
>>> Welcome!
>>>
>>>
>>> On Fri, Mar 13, 2020 at 9:47 AM Yichi Zhang  wrote:
>>>
 Welcome!

 On Fri, Mar 13, 2020 at 9:40 AM Ahmet Altay  wrote:

> Welcome Brittany!
>
> On Thu, Mar 12, 2020 at 6:32 PM Brittany Hermann 
> wrote:
>
>> Hello Beam Community!
>>
>> My name is Brittany Hermann and I recently joined the Open Source
>> team in Data Analytics at Google. As a Program Manager, I will be 
>> focusing
>> on community engagement while getting to work on Apache Beam and Airflow
>> projects! I have always thrived on creating healthy, diverse, and overall
>> happy communities and am excited to bring that to the team. For a fun 
>> fact,
>> I am a big Wisconsin Badgers Football fan and have a goldendoodle puppy
>> named Ollie!
>>
>> I look forward to collaborating with you all!
>>
>> Kind regards,
>>
>> Brittany Hermann
>>
>>
>>
>>
>> --
>> Regards,
>> Tomo
>>
>


Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-24 Thread Ismaël Mejía
Forgot to mention that one particularly pesky issue we found in the work on
Redshift is to be able to write unit tests on this.

Is there an embedded version of SnowFlake to run those. I would like also if
possible to get some ideas on how to test this use case.

Also we should probably ensure that the FileIO part is generic enough so we
can
use S3 too because users can be using Snowflake in AWS too.


On Tue, Mar 24, 2020 at 10:10 AM Ismaël Mejía  wrote:

> Great !
> It seems this pattern (COPY + parallel file read) is becoming a standard
> for
> 'data warehouses' we are using something similar too in the AWS Redshift
> PR (WIP)
> for details: https://github.com/apache/beam/pull/10206
>
> Maybe worth for all of us to check and se eif we can converge the
> implementations as
> much as possible to provide users a consistent experience.
>
>
> On Tue, Mar 24, 2020 at 10:02 AM Elias Djurfeldt <
> elias.djurfe...@mirado.com> wrote:
>
>> Awesome job! I'm very interested in the cross-language support.
>>
>> Cheers,
>>
>> On Tue, 24 Mar 2020 at 01:20, Chamikara Jayalath 
>> wrote:
>>
>>> Sounds great. Looks like operation of the Snowflake source will be
>>> similar to BigQuery source (export files to GCS and read files). This will
>>> allow you to better parallelize reading (current JDBC source is limited to
>>> one worker when reading).
>>>
>>> Seems like you already support initial splitting using files -
>>> https://github.com/PolideaInternal/beam/blob/snowflake-io/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java#L374
>>> Prob. also consider supporting dynamic work rebalancing when runners
>>> support this through SDF.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>
>>>
>>> On Mon, Mar 23, 2020 at 9:49 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 Great! This is always welcomed to have more IOs in Beam. I’d be happy
 to take look on your PR once it will be created.

 Just a couple of questions for now.

 1) Afaik, you can connect to Snowflake using standard JDBC driver. Do
 you plan to compare a performance between this SnowflakeIO and Beam JdbcIO?
 2) Are you going to support staging in other locations, like S3 and
 Azure?
 3) Does “ withSchema()” allows to infer Snowflake schema to Beam schema?

 On 23 Mar 2020, at 15:23, Katarzyna Kucharczyk 
 wrote:

 Hi all,

 Me and my colleagues have developed a new Java connector for Snowflake
 that we would like to add to Beam.

 Snowflake is an analytic data warehouse provided as
 Software-as-a-Service (SaaS). It uses a new SQL database engine with a
 unique architecture designed for the cloud. To read more details please
 check [1] and [2].

 Proposed Snowflake IOs use JDBC Snowflake library [3]. The IOs are
 batch write and batch read that use the Snowflake COPY [4] operation
 underneath. In both cases ParDo IOs load files on a stage and then they are
 inserted into the Snowflake table of choice using the COPY API. The
 currently supported stage is Google Cloud Storage[5].

 The schema how Snowflake Read IO works (write operation works similarly
 but in opposite direction):
 Here is an Apache Beam fork [6] with current work of the Snowflake IO.

 In the near future we would like to also add IO for writing streams
 which will use SnowPipe - Snowflake mechanism for continuous loading[7].
 Also, we would like to use cross language to provide Python connectors as
 well.

 We are open for all opinions and suggestions. In case of any
 questions/comments please do not hesitate to post them.

 In case of no objection I will create jira tickets and share them in
 this thread. Cheers, Kasia

 [1] https://www.snowflake.com
 [2]
 https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html
 [3] https://docs.snowflake.net/manuals/user-guide/jdbc.html
 [4]
 https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
 [5]
 https://github.com/PolideaInternal/beam/tree/snowflake-io/sdks/java/io/snowflake

 [6] https://cloud.google.com/storage
 [7]
 https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe.html



>>
>> --
>> Elias Djurfeldt
>> Mirado Consulting
>>
>


Re: FlinkRequiresStableInput test is very flaky

2020-03-24 Thread Ismaël Mejía
Oh that looks like a considerable bug and could explain the not so
good performance of the Portable VR tests (if this is happening for
every call as you mention). Looks like more worth of a JIRA even than
the FlinkRequiresStableInputTest.



On Mon, Mar 23, 2020 at 7:42 PM Reuven Lax  wrote:
>
> It looks like every call to getParDoPayload tries to zip up the entire 
> staging directory. This is pretty bad, as we call getParDoPayload in many 
> places, and most of those places don't expect it to have such a side effect.
>
> On Mon, Mar 23, 2020 at 11:38 AM Reuven Lax  wrote:
>>
>> I can file a JIRA, but does anyone know why this test is so flaky? Do we 
>> simply need to give it longer to run?
>>
>> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1912/testReport/junit/org.apache.beam.runners.flink/FlinkRequiresStableInputTest/testParDoRequiresStableInput/


Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-24 Thread Ismaël Mejía
Great !
It seems this pattern (COPY + parallel file read) is becoming a standard for
'data warehouses' we are using something similar too in the AWS Redshift PR
(WIP)
for details: https://github.com/apache/beam/pull/10206

Maybe worth for all of us to check and se eif we can converge the
implementations as
much as possible to provide users a consistent experience.


On Tue, Mar 24, 2020 at 10:02 AM Elias Djurfeldt 
wrote:

> Awesome job! I'm very interested in the cross-language support.
>
> Cheers,
>
> On Tue, 24 Mar 2020 at 01:20, Chamikara Jayalath 
> wrote:
>
>> Sounds great. Looks like operation of the Snowflake source will be
>> similar to BigQuery source (export files to GCS and read files). This will
>> allow you to better parallelize reading (current JDBC source is limited to
>> one worker when reading).
>>
>> Seems like you already support initial splitting using files -
>> https://github.com/PolideaInternal/beam/blob/snowflake-io/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java#L374
>> Prob. also consider supporting dynamic work rebalancing when runners
>> support this through SDF.
>>
>> Thanks,
>> Cham
>>
>>
>>
>>
>> On Mon, Mar 23, 2020 at 9:49 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Great! This is always welcomed to have more IOs in Beam. I’d be happy to
>>> take look on your PR once it will be created.
>>>
>>> Just a couple of questions for now.
>>>
>>> 1) Afaik, you can connect to Snowflake using standard JDBC driver. Do
>>> you plan to compare a performance between this SnowflakeIO and Beam JdbcIO?
>>> 2) Are you going to support staging in other locations, like S3 and
>>> Azure?
>>> 3) Does “ withSchema()” allows to infer Snowflake schema to Beam schema?
>>>
>>> On 23 Mar 2020, at 15:23, Katarzyna Kucharczyk 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> Me and my colleagues have developed a new Java connector for Snowflake
>>> that we would like to add to Beam.
>>>
>>> Snowflake is an analytic data warehouse provided as
>>> Software-as-a-Service (SaaS). It uses a new SQL database engine with a
>>> unique architecture designed for the cloud. To read more details please
>>> check [1] and [2].
>>>
>>> Proposed Snowflake IOs use JDBC Snowflake library [3]. The IOs are batch
>>> write and batch read that use the Snowflake COPY [4] operation underneath.
>>> In both cases ParDo IOs load files on a stage and then they are inserted
>>> into the Snowflake table of choice using the COPY API. The currently
>>> supported stage is Google Cloud Storage[5].
>>>
>>> The schema how Snowflake Read IO works (write operation works similarly
>>> but in opposite direction):
>>> Here is an Apache Beam fork [6] with current work of the Snowflake IO.
>>>
>>> In the near future we would like to also add IO for writing streams
>>> which will use SnowPipe - Snowflake mechanism for continuous loading[7].
>>> Also, we would like to use cross language to provide Python connectors as
>>> well.
>>>
>>> We are open for all opinions and suggestions. In case of any
>>> questions/comments please do not hesitate to post them.
>>>
>>> In case of no objection I will create jira tickets and share them in
>>> this thread. Cheers, Kasia
>>>
>>> [1] https://www.snowflake.com
>>> [2]
>>> https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html
>>> [3] https://docs.snowflake.net/manuals/user-guide/jdbc.html
>>> [4] https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
>>>
>>> [5]
>>> https://github.com/PolideaInternal/beam/tree/snowflake-io/sdks/java/io/snowflake
>>>
>>> [6] https://cloud.google.com/storage
>>> [7]
>>> https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe.html
>>>
>>>
>>>
>
> --
> Elias Djurfeldt
> Mirado Consulting
>


Re: [PROPOSAL] Snowflake Java Connector for Apache Beam

2020-03-24 Thread Elias Djurfeldt
Awesome job! I'm very interested in the cross-language support.

Cheers,

On Tue, 24 Mar 2020 at 01:20, Chamikara Jayalath 
wrote:

> Sounds great. Looks like operation of the Snowflake source will be similar
> to BigQuery source (export files to GCS and read files). This will allow
> you to better parallelize reading (current JDBC source is limited to one
> worker when reading).
>
> Seems like you already support initial splitting using files -
> https://github.com/PolideaInternal/beam/blob/snowflake-io/sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java#L374
> Prob. also consider supporting dynamic work rebalancing when runners
> support this through SDF.
>
> Thanks,
> Cham
>
>
>
>
> On Mon, Mar 23, 2020 at 9:49 AM Alexey Romanenko 
> wrote:
>
>> Great! This is always welcomed to have more IOs in Beam. I’d be happy to
>> take look on your PR once it will be created.
>>
>> Just a couple of questions for now.
>>
>> 1) Afaik, you can connect to Snowflake using standard JDBC driver. Do you
>> plan to compare a performance between this SnowflakeIO and Beam JdbcIO?
>> 2) Are you going to support staging in other locations, like S3 and Azure?
>> 3) Does “ withSchema()” allows to infer Snowflake schema to Beam schema?
>>
>> On 23 Mar 2020, at 15:23, Katarzyna Kucharczyk 
>> wrote:
>>
>> Hi all,
>>
>> Me and my colleagues have developed a new Java connector for Snowflake
>> that we would like to add to Beam.
>>
>> Snowflake is an analytic data warehouse provided as Software-as-a-Service
>> (SaaS). It uses a new SQL database engine with a unique architecture
>> designed for the cloud. To read more details please check [1] and [2].
>>
>> Proposed Snowflake IOs use JDBC Snowflake library [3]. The IOs are batch
>> write and batch read that use the Snowflake COPY [4] operation underneath.
>> In both cases ParDo IOs load files on a stage and then they are inserted
>> into the Snowflake table of choice using the COPY API. The currently
>> supported stage is Google Cloud Storage[5].
>>
>> The schema how Snowflake Read IO works (write operation works similarly
>> but in opposite direction):
>> Here is an Apache Beam fork [6] with current work of the Snowflake IO.
>>
>> In the near future we would like to also add IO for writing streams which
>> will use SnowPipe - Snowflake mechanism for continuous loading[7]. Also, we
>> would like to use cross language to provide Python connectors as well.
>>
>> We are open for all opinions and suggestions. In case of any
>> questions/comments please do not hesitate to post them.
>>
>> In case of no objection I will create jira tickets and share them in this
>> thread. Cheers, Kasia
>>
>> [1] https://www.snowflake.com
>> [2] https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html
>>
>> [3] https://docs.snowflake.net/manuals/user-guide/jdbc.html
>> [4] https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
>> [5]
>> https://github.com/PolideaInternal/beam/tree/snowflake-io/sdks/java/io/snowflake
>>
>> [6] https://cloud.google.com/storage
>> [7] https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe.html
>>
>>
>>
>>

-- 
Elias Djurfeldt
Mirado Consulting