Re: Self-checkpoint Support on Portable Flink

2020-10-23 Thread Boyuan Zhang
Hi there,

I just updated the doc

with
implementation details and opened PR13105
 for review.

Thanks for your help!

On Wed, Oct 14, 2020 at 3:40 AM Maximilian Michels  wrote:

> Duplicates cannot happen because the state of all operators will be
> rolled back to the latest checkpoint, in case of failures.
>
> On 14.10.20 06:31, Reuven Lax wrote:
> > Does this mean that we have to deal with duplicate messages over the
> > back edge? Or will that not happen, since duplicates mean that we rolled
> > back a checkpoint.
> >
> > On Tue, Oct 13, 2020 at 2:59 AM Maximilian Michels  > > wrote:
> >
> > There would be ways around the lack of checkpointing in cycles, e.g.
> > buffer and backloop only after checkpointing is complete, similarly
> how
> > we implement @RequiresStableInput in the Flink Runner.
> >
> > -Max
> >
> > On 07.10.20 04:05, Reuven Lax wrote:
> >  > It appears that there's a proposal
> >  >
> > (
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance
> > <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance
> >
> >
> >  >
> > <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance
> > <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance
> >>)
> >
> >  > and an abandoned PR to fix this, but AFAICT this remains a
> > limitation of
> >  > Flink. If Flink can't guarantee processing of records on back
> > edges, I
> >  > don't think we can use cycles, as we might otherwise lose the
> > residuals.
> >  >
> >  > On Tue, Oct 6, 2020 at 6:16 PM Reuven Lax  > 
> >  > >> wrote:
> >  >
> >  > This is what I was thinking of
> >  >
> >  > "Flink currently only provides processing guarantees for jobs
> >  > without iterations. Enabling checkpointing on an iterative job
> >  > causes an exception. In order to force checkpointing on an
> > iterative
> >  > program the user needs to set a special flag when enabling
> >  > checkpointing:|env.enableCheckpointing(interval,
> >  > CheckpointingMode.EXACTLY_ONCE, force = true)|.
> >  >
> >  > Please note that records in flight in the loop edges (and the
> > state
> >  > changes associated with them) will be lost during failure."
> >  >
> >  >
> >  >
> >  >
> >  >
> >  >
> >  > On Tue, Oct 6, 2020 at 5:44 PM Boyuan Zhang
> > mailto:boyu...@google.com>
> >  > >>
> wrote:
> >  >
> >  > Hi Reuven,
> >  >
> >  > As Luke mentioned, at least there are some limitations
> around
> >  > tracking watermark with flink cycles. I'm going to use
> > State +
> >  > Timer without flink cycle to support self-checkpoint. For
> >  > dynamic split, we can either explore flink cycle approach
> or
> >  > limit depth approach.
> >  >
> >  > On Tue, Oct 6, 2020 at 5:33 PM Reuven Lax
> > mailto:re...@google.com>
> >  > >>
> wrote:
> >  >
> >  > Aren't there some limitations associated with flink
> > cycles?
> >  > I seem to remember various features that could not be
> > used.
> >  > I'm assuming that watermarks are not supported across
> >  > cycles, but is there anything else?
> >  >
> >  > On Tue, Oct 6, 2020 at 7:12 AM Maximilian Michels
> >  > mailto:m...@apache.org>
> > >> wrote:
> >  >
> >  > Thanks for starting the conversation. The two
> > approaches
> >  > both look good
> >  > to me. Probably we want to start with approach #1
> for
> >  > all Runners to be
> >  > able to support delaying bundles. Flink supports
> > cycles
> >  > and thus
> >  > approach #2 would also be applicable and could be
> > used
> >  > to implement
> >  > dynamic splitting.
> >  >
> >  > -Max
> >  >
> >  > On 05.10.20 23:13, Luke Cwik wrote:
> >  >  > Thanks Boyuan, I left a few comments.
> >  >  >
> >  >  > On Mon, Oct 5, 2020 at 11:12 AM Boyuan Zhang
> >  > mailto:boyu...@google.com>
> > 

[GitHub] [beam-site] robinyqiu merged pull request #608: Publish 2.25.0 release

2020-10-23 Thread GitBox


robinyqiu merged pull request #608:
URL: https://github.com/apache/beam-site/pull/608


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[RESULT] [VOTE] Release 2.25.0, release candidate #2

2020-10-23 Thread Robin Qiu
I'm happy to announce that we have unanimously approved this release.

There are 6 approving votes, 3 of which are binding:
* Ahmet Altay
* Pablo Estrada
* Robert Bradshaw

There are no disapproving votes.

Thanks everyone!


Re: [VOTE] Release 2.25.0, release candidate #2

2020-10-23 Thread Robin Qiu
Hi everyone, we now have 3 +1's from PMC members and no -1. The vote has
passed 72 hours and I am closing it now.

On Fri, Oct 23, 2020 at 1:47 PM Robert Bradshaw  wrote:

> +1 (binding).
>
> I verified the release artifacts and signatures, and tried a couple of
> Python pipelines from an install of a wheel in a fresh virtual
> environment. All looks good to me.
>
> On Thu, Oct 22, 2020 at 4:54 PM Tyson Hamilton  wrote:
> >
> > +1
> >
> > I went through the Nexmark queries and validated the results.
> >
> > On Thu, Oct 22, 2020 at 4:43 PM Pablo Estrada 
> wrote:
> >>
> >> +1 (binding)
> >> Validated Java quickstart for Direct, Dataflow, Spark runners.
> >> Tried out a few interactive queries on InteractiveRunner on Ipython.
> >> Best
> >> -P.
> >>
> >> On Thu, Oct 22, 2020 at 4:27 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
> >>>
> >>> +1.
> >>>
> >>> Verified the internal container images for Dataflow and verified that
> the release artifacts are not installable on Python 2 and Python 3.5 (which
> could otherwise break Beam Py2/Py3.5 users who don't set an upper bound on
> Beam).
> >>>
> >>> On Wed, Oct 21, 2020 at 1:14 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
> 
>  +1 (non-binding).
> 
>  Validated Java quickstart for Direct/Dataflow runners and x-lang
> Kafka/SQL.
> 
>  Thanks,
>  Cham
> 
>  On Wed, Oct 21, 2020 at 6:03 AM Ismaël Mejía 
> wrote:
> >
> > Unrelated to the vote, but related to the Java 8/11 issue.
> >
> > We have some 'forward' compatibility tests that rely on the Beam
> daily SNAPSHOT jars
> > and they starting failing two days ago, it seems the SNAPSHOTs are
> now built also with Java 11  (not sure if related)
> >
> > Filled https://issues.apache.org/jira/browse/BEAM-11080 in case
> someone can take a look
> >
> > The SNAPSHOTs should be built with Java 8 too. We use these for
> forward
> > compatibility tests and they have helped us find multiple
> regressions in the
> > past.
> >
> > On Wed, Oct 21, 2020 at 3:42 AM Ahmet Altay 
> wrote:
> >>
> >> +1 - I verified python quickstarts.
> >>
> >> On Tue, Oct 20, 2020 at 11:36 AM Robin Qiu 
> wrote:
> >>>
> >>> Hi everyone,
> >>> Please review and vote on the release candidate #2 for the version
> 2.25.0, as follows:
> >>> [ ] +1, Approve the release
> >>> [ ] -1, Do not approve the release (please provide specific
> comments)
> >>>
> >>>
> >>> The complete staging area is available for your review, which
> includes:
> >>> * JIRA release notes [1],
> >>> * the official Apache source release to be deployed to
> dist.apache.org [2], which is signed with the key with fingerprint
> AD70476B9D1AF3EFEC2208165952E71AACAF911D [3],
> >>> * all artifacts to be deployed to the Maven Central Repository [4],
> >>> * source code tag "v2.25.0-RC2" [5],
> >>> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
> >>> * Java artifacts were built with Maven 3.5.3 and OpenJDK 1.8.0
> >>> * Python artifacts are deployed along with the source release to
> the dist.apache.org [2].
> >>> * Validation sheet with a tab for 2.25.0 release to help with
> validation [9].
> >>> * Docker images published to Docker Hub [10].
> >>>
> >>> The vote will be open for at least 72 hours. It is adopted by
> majority approval, with at least 3 PMC affirmative votes.
> >>>
> >>> Thanks,
> >>> Robin
> >>>
> >>> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12347147
> >>> [2] https://dist.apache.org/repos/dist/dev/beam/2.25.0/
> >>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >>> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1142
> >>> [5] https://github.com/apache/beam/tree/v2.25.0-RC2
> >>> [6] https://github.com/apache/beam/pull/13130
> >>> [7] https://github.com/apache/beam-site/pull/608
> >>> [8] https://github.com/apache/beam/pull/13131
> >>> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1494345946
> >>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>


Re: [VOTE] Release 2.25.0, release candidate #2

2020-10-23 Thread Robert Bradshaw
+1 (binding).

I verified the release artifacts and signatures, and tried a couple of
Python pipelines from an install of a wheel in a fresh virtual
environment. All looks good to me.

On Thu, Oct 22, 2020 at 4:54 PM Tyson Hamilton  wrote:
>
> +1
>
> I went through the Nexmark queries and validated the results.
>
> On Thu, Oct 22, 2020 at 4:43 PM Pablo Estrada  wrote:
>>
>> +1 (binding)
>> Validated Java quickstart for Direct, Dataflow, Spark runners.
>> Tried out a few interactive queries on InteractiveRunner on Ipython.
>> Best
>> -P.
>>
>> On Thu, Oct 22, 2020 at 4:27 PM Valentyn Tymofieiev  
>> wrote:
>>>
>>> +1.
>>>
>>> Verified the internal container images for Dataflow and verified that the 
>>> release artifacts are not installable on Python 2 and Python 3.5 (which 
>>> could otherwise break Beam Py2/Py3.5 users who don't set an upper bound on 
>>> Beam).
>>>
>>> On Wed, Oct 21, 2020 at 1:14 PM Chamikara Jayalath  
>>> wrote:

 +1 (non-binding).

 Validated Java quickstart for Direct/Dataflow runners and x-lang Kafka/SQL.

 Thanks,
 Cham

 On Wed, Oct 21, 2020 at 6:03 AM Ismaël Mejía  wrote:
>
> Unrelated to the vote, but related to the Java 8/11 issue.
>
> We have some 'forward' compatibility tests that rely on the Beam daily 
> SNAPSHOT jars
> and they starting failing two days ago, it seems the SNAPSHOTs are now 
> built also with Java 11  (not sure if related)
>
> Filled https://issues.apache.org/jira/browse/BEAM-11080 in case someone 
> can take a look
>
> The SNAPSHOTs should be built with Java 8 too. We use these for forward
> compatibility tests and they have helped us find multiple regressions in 
> the
> past.
>
> On Wed, Oct 21, 2020 at 3:42 AM Ahmet Altay  wrote:
>>
>> +1 - I verified python quickstarts.
>>
>> On Tue, Oct 20, 2020 at 11:36 AM Robin Qiu  wrote:
>>>
>>> Hi everyone,
>>> Please review and vote on the release candidate #2 for the version 
>>> 2.25.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> The complete staging area is available for your review, which includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org 
>>> [2], which is signed with the key with fingerprint 
>>> AD70476B9D1AF3EFEC2208165952E71AACAF911D [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.25.0-RC2" [5],
>>> * website pull request listing the release [6], publishing the API 
>>> reference manual [7], and the blog post [8].
>>> * Java artifacts were built with Maven 3.5.3 and OpenJDK 1.8.0
>>> * Python artifacts are deployed along with the source release to the 
>>> dist.apache.org [2].
>>> * Validation sheet with a tab for 2.25.0 release to help with 
>>> validation [9].
>>> * Docker images published to Docker Hub [10].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority 
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>> Robin
>>>
>>> [1] 
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12347147
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.25.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4] 
>>> https://repository.apache.org/content/repositories/orgapachebeam-1142
>>> [5] https://github.com/apache/beam/tree/v2.25.0-RC2
>>> [6] https://github.com/apache/beam/pull/13130
>>> [7] https://github.com/apache/beam-site/pull/608
>>> [8] https://github.com/apache/beam/pull/13131
>>> [9] 
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1494345946
>>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image


Re: [DISCUSS] Sensible dependency upgrades

2020-10-23 Thread Robert Bradshaw
On Fri, Oct 23, 2020 at 10:16 AM Luke Cwik  wrote:
>
> An additional thing I forgot to mention was that if we only had portable 
> runners our BOM story would be simplified since we wouldn't have the runner 
> on the classpath and users would have a consistent experience across runners 
> with regards to dependency convergence.

While that may be true in principle, I think that once we move
everything over to portable runners there will still be a strong
desire to use "embedded" rather than "docker" environments for the
pure-java usecases, which would require compatible classpaths.


> On Fri, Oct 23, 2020 at 6:15 AM Piotr Szuberski  
> wrote:
>>
>> Thank you for pointing it out. The awareness problem fits me well here - I 
>> have a good lesson to discuss things on the devlist.
>>
>> About SolrIO - I'll create a thread on @users to discuss which versions 
>> should be supported and make relevant changes after getting a conclusion.
>>
>> On 2020/10/22 14:24:45, Ismaël Mejía  wrote:
>> > I have seen ongoing work on upgrading dependencies, this is a great task 
>> > needed
>> > for the health of the project and its IO connectors, however I am a bit 
>> > worried
>> > on the impact of these on existing users. We should be aware that we 
>> > support old
>> > versions of the clients for valid reasons. If we update a version of a 
>> > client we
>> > should ensure that it still interacts correctly with existing users and 
>> > runtime
>> > systems. Basically we need two conditions:
>> >
>> > 1. We cannot update dependencies without considering the current use of 
>> > them.
>> > 2. We must avoid upgrading to a non-stable or non-LTS dependency version
>> >
>> > For (1) in a recent thread Piotr brang some issues about updating Hadoop
>> > dependencies to version 3. This surprised me because the whole Big Data
>> > ecosystem is just catching up with Hadoop 3  (Flink does not even release
>> > artifacts for this yet, and Spark just started on version 3 some months 
>> > ago),
>> > which means that most of our users still need that we guarantee 
>> > compatiblity
>> > with Hadoop 2.x dependencies.
>> >
>> > The Hadoop dependencies are mostly 'provided' so a way to achieve this is 
>> > by
>> > creating new test configurations that guarantees backwards (or forwards)
>> > compatibility by providing the respective versions. This is similar to 
>> > what we
>> > do currently in KafkaIO by using by default version 1.0.0 but testing
>> > compatibility with 2.1.0 by providing the right dependencies too.
>> >
>> > The same thread discusses also upgrading to version 3.3.x the latest, but 
>> > per
>> > (2) we should not consider upgrades to non stable versions which of Hadoop 
>> >  is
>> > currently 3.2.1.  https://hadoop.apache.org/docs/stable/
>> >
>> > I also saw a recent upgrade of SolrIO to version 8 which may affect some 
>> > users
>> > of previous versions with no discussion about it on the mailing lists and 
>> > no
>> > backwards compatibility guarantees.
>> > https://github.com/apache/beam/pull/13027
>> >
>> > In the Solr case I think probably this update makes more sense since Solr 
>> > 5.x
>> > is deprecated and less people would be probably impacted but still it would
>> > have been good to discuss this on user@
>> >
>> > I don't know how we can find a good equilibrium between deciding on those
>> > upgrades from maintainers vs users without adding much overhead. Should we 
>> > have
>> > a VOTE maybe for the most sensible dependencies? or just assume this is a
>> > criteria for the maintainers, I am afraid we may end up with
>> > incompatible changes
>> > due to the lack of awareness or for not much in return but at the same
>> > time I wonder if it makes sense to add the extra work of discussion
>> > for minor dependencies where this matters less.
>> >
>> > Should we document maybe the sensible dependency upgrades (the recent
>> > thread on Avro upgrade comes to my mind too)? Or should we have the same
>> > criteria for all.  Other ideas?
>> >


Re: [DISCUSS] Sensible dependency upgrades

2020-10-23 Thread Luke Cwik
An additional thing I forgot to mention was that if we only had portable
runners our BOM story would be simplified since we wouldn't have the runner
on the classpath and users would have a consistent experience across
runners with regards to dependency convergence.

On Fri, Oct 23, 2020 at 6:15 AM Piotr Szuberski 
wrote:

> Thank you for pointing it out. The awareness problem fits me well here - I
> have a good lesson to discuss things on the devlist.
>
> About SolrIO - I'll create a thread on @users to discuss which versions
> should be supported and make relevant changes after getting a conclusion.
>
> On 2020/10/22 14:24:45, Ismaël Mejía  wrote:
> > I have seen ongoing work on upgrading dependencies, this is a great task
> needed
> > for the health of the project and its IO connectors, however I am a bit
> worried
> > on the impact of these on existing users. We should be aware that we
> support old
> > versions of the clients for valid reasons. If we update a version of a
> client we
> > should ensure that it still interacts correctly with existing users and
> runtime
> > systems. Basically we need two conditions:
> >
> > 1. We cannot update dependencies without considering the current use of
> them.
> > 2. We must avoid upgrading to a non-stable or non-LTS dependency version
> >
> > For (1) in a recent thread Piotr brang some issues about updating Hadoop
> > dependencies to version 3. This surprised me because the whole Big Data
> > ecosystem is just catching up with Hadoop 3  (Flink does not even release
> > artifacts for this yet, and Spark just started on version 3 some months
> ago),
> > which means that most of our users still need that we guarantee
> compatiblity
> > with Hadoop 2.x dependencies.
> >
> > The Hadoop dependencies are mostly 'provided' so a way to achieve this
> is by
> > creating new test configurations that guarantees backwards (or forwards)
> > compatibility by providing the respective versions. This is similar to
> what we
> > do currently in KafkaIO by using by default version 1.0.0 but testing
> > compatibility with 2.1.0 by providing the right dependencies too.
> >
> > The same thread discusses also upgrading to version 3.3.x the latest,
> but per
> > (2) we should not consider upgrades to non stable versions which of
> Hadoop  is
> > currently 3.2.1.  https://hadoop.apache.org/docs/stable/
> >
> > I also saw a recent upgrade of SolrIO to version 8 which may affect some
> users
> > of previous versions with no discussion about it on the mailing lists
> and no
> > backwards compatibility guarantees.
> > https://github.com/apache/beam/pull/13027
> >
> > In the Solr case I think probably this update makes more sense since
> Solr 5.x
> > is deprecated and less people would be probably impacted but still it
> would
> > have been good to discuss this on user@
> >
> > I don't know how we can find a good equilibrium between deciding on those
> > upgrades from maintainers vs users without adding much overhead. Should
> we have
> > a VOTE maybe for the most sensible dependencies? or just assume this is a
> > criteria for the maintainers, I am afraid we may end up with
> > incompatible changes
> > due to the lack of awareness or for not much in return but at the same
> > time I wonder if it makes sense to add the extra work of discussion
> > for minor dependencies where this matters less.
> >
> > Should we document maybe the sensible dependency upgrades (the recent
> > thread on Avro upgrade comes to my mind too)? Or should we have the same
> > criteria for all.  Other ideas?
> >
>


Re: Updating elasticsearch version to 7.9.2 - problem with HadoopFormatIOElasticTest that uses ES emulator

2020-10-23 Thread Piotr Szuberski
I've already found the assumptions, they were in elastic_test_data.py. The 
relevant PR is at  https://github.com/apache/beam/pull/13085

On 2020/10/22 18:15:11, Tyson Hamilton  wrote: 
> IMO it really comes down to stability & runtime differences. If there are
> no significant changes to either of these then keeping it as a
> precommit and using test containers is fine. Where are the assumptions in
> the IT test, in HadoopFormatIOElasticTest?
> 
> On Mon, Oct 12, 2020 at 10:10 AM Piotr Szuberski <
> piotr.szuber...@polidea.com> wrote:
> 
> > I'm trying to update elasticsearch version to 7.9.2 but I've encountered a
> > problem with HadoopFormatIOElasticTest that uses ES in-memory emulator that
> > is no longer supported:
> > https://stackoverflow.com/questions/51316813/elastic-node-on-local-in-6-2
> >
> > It's recommended to use testcontainers as proposed here
> > https://github.com/allegro/embedded-elasticsearch but it would transform
> > the in-memory test to integration test (which has to be done anyway)
> >
> > There is also Elasticsearch test framework with ESSingleNodeTestCase but
> > it causes Jar Hell problem and I don't think it's easily solvable - the
> > dependencies in "java core" and "java core test".
> > I tried to
> >
> > Is running the precommit test with testcontainers acceptable? It's the
> > easiest fix.
> >
> > About the integration test:
> > I'd like to enable the IT test in Java PostCommit but there are some
> > assumptions about the data that is already written to Elasticsearch but I
> > can't find anywhere what that data should be (Probably something like
> > Item_Price0, Item_Price1 etc but I'm not sure)
> >
> 


Re: [DISCUSS] Sensible dependency upgrades

2020-10-23 Thread Piotr Szuberski
Thank you for pointing it out. The awareness problem fits me well here - I have 
a good lesson to discuss things on the devlist.

About SolrIO - I'll create a thread on @users to discuss which versions should 
be supported and make relevant changes after getting a conclusion.

On 2020/10/22 14:24:45, Ismaël Mejía  wrote: 
> I have seen ongoing work on upgrading dependencies, this is a great task 
> needed
> for the health of the project and its IO connectors, however I am a bit 
> worried
> on the impact of these on existing users. We should be aware that we support 
> old
> versions of the clients for valid reasons. If we update a version of a client 
> we
> should ensure that it still interacts correctly with existing users and 
> runtime
> systems. Basically we need two conditions:
> 
> 1. We cannot update dependencies without considering the current use of them.
> 2. We must avoid upgrading to a non-stable or non-LTS dependency version
> 
> For (1) in a recent thread Piotr brang some issues about updating Hadoop
> dependencies to version 3. This surprised me because the whole Big Data
> ecosystem is just catching up with Hadoop 3  (Flink does not even release
> artifacts for this yet, and Spark just started on version 3 some months ago),
> which means that most of our users still need that we guarantee compatiblity
> with Hadoop 2.x dependencies.
> 
> The Hadoop dependencies are mostly 'provided' so a way to achieve this is by
> creating new test configurations that guarantees backwards (or forwards)
> compatibility by providing the respective versions. This is similar to what we
> do currently in KafkaIO by using by default version 1.0.0 but testing
> compatibility with 2.1.0 by providing the right dependencies too.
> 
> The same thread discusses also upgrading to version 3.3.x the latest, but per
> (2) we should not consider upgrades to non stable versions which of Hadoop  is
> currently 3.2.1.  https://hadoop.apache.org/docs/stable/
> 
> I also saw a recent upgrade of SolrIO to version 8 which may affect some users
> of previous versions with no discussion about it on the mailing lists and no
> backwards compatibility guarantees.
> https://github.com/apache/beam/pull/13027
> 
> In the Solr case I think probably this update makes more sense since Solr 5.x
> is deprecated and less people would be probably impacted but still it would
> have been good to discuss this on user@
> 
> I don't know how we can find a good equilibrium between deciding on those
> upgrades from maintainers vs users without adding much overhead. Should we 
> have
> a VOTE maybe for the most sensible dependencies? or just assume this is a
> criteria for the maintainers, I am afraid we may end up with
> incompatible changes
> due to the lack of awareness or for not much in return but at the same
> time I wonder if it makes sense to add the extra work of discussion
> for minor dependencies where this matters less.
> 
> Should we document maybe the sensible dependency upgrades (the recent
> thread on Avro upgrade comes to my mind too)? Or should we have the same
> criteria for all.  Other ideas?
> 


Re: Apache Beam case studies

2020-10-23 Thread Karolina Rosół
Hi,

@Luke thanks a mill for cc'ing Mariann :-) Let's see if there's any
interest.

Kind regards,

Karolina Rosół
Polidea  | Head of Cloud & OSS

M: +48 606 630 236 <+48606630236>
E: karolina.ro...@polidea.com
[image: Polidea] 

Check out our projects! 
[image: Github]  [image: Facebook]
 [image: Twitter]
 [image: Linkedin]
 [image: Instagram]
 [image: Behance]
 [image: dribbble]



On Wed, Oct 21, 2020 at 5:50 PM Luke Cwik  wrote:

> +Mariann Nagy  has been doing things like this for
> years now and may be interested.
>
> On Wed, Oct 21, 2020 at 12:50 AM Karolina Rosół <
> karolina.ro...@polidea.com> wrote:
>
>> Hi folks,
>>
>> With some people from Polidea we came up with an idea to carry out
>> interviews with Apache Beam users to spread the news about the Beam model
>> and engage more people to use it.
>>
>> Ideally, we'd set up an online meeting with interested people and then do
>> an interview. We'd like to ask questions such as 'how did you find out
>> about Apache Beam' / 'how do you use Apache Beam in your company/product?'
>> etc. We'd love to post the whole interview on Polidea and Apache Beam
>> website.
>>
>> If any of you is interested, please let me know in this thread :-)
>>
>> Wish you all a happy week and stay safe!
>>
>> Karolina Rosół
>> Polidea  | Head of Cloud & OSS
>>
>> M: +48 606 630 236 <+48606630236>
>> E: karolina.ro...@polidea.com
>> [image: Polidea] 
>>
>> Check out our projects! 
>> [image: Github]  [image: Facebook]
>>  [image: Twitter]
>>  [image: Linkedin]
>>  [image: Instagram]
>>  [image: Behance]
>>  [image: dribbble]
>> 
>>
>