date:20190403

Re: Beam contribution

2019-04-03 Thread Csaba Kassai

Oh, I just missed it then :)
Thank you Lukasz for connecting us.

Yeah, the two TimerReceiverTest tests fail reliably for me.





On Tue, 2 Apr 2019 at 23:53, Lukasz Cwik  wrote:

> +Ahmed
>
> I have added you as a contributor.
>
> It seems as though Ahmed had just picked up BEAM-3489 yesterday. Reach out
> to Ahmed if you would like to help them out with the task.
>
> Was TimerReceiverTest failing reliably when performing a parallel build or
> is it flaky?
>
> I have asked Chamikara to take a look for PR 8180.
>
>
> On Tue, Apr 2, 2019 at 8:33 AM Csaba Kassai  wrote:
>
>> Hi All!
>>
>> I am Csabi, I would be happy to contribute to Beam.
>> Could you grant me contributor role and assign issue BEAM-3489
>>   to me? My user name
>> is "csabakassai".
>>
>> After I checked out the code and tried to do a gradle check I found these
>> issues:
>>
>>1. *jUnit tests fails:* the TimerReceiverTest fails in the
>>":beam-runners-google-cloud-dataflow-java-fn-api-worker:test" and the
>>":beam-runners-google-cloud-dataflow-java-legacy-worker:test" tasks. When 
>> I
>>execute tests independently everything is fine, so I disabled the parallel
>>build and this solves the problem. I have not investigated further, do you
>>have any more insights on this issue? I have attached the test reports.
>>2. *python test fail*: there is a python test which fails if the
>>current offset of your timezone differs from the offset in 1970. In my 
>> case
>>the Singapore is now GMT+8 and it was GMT+7:30 in 1970. I created a ticket
>>for this issue where I I describe the problem in details:
>>https://jira.apache.org/jira/browse/BEAM-6947. Could you assign the
>>ticket to me? Also I created a PR with a possible fix:
>>https://github.com/apache/beam/pull/8180. Could you suggest me a
>>reviewer?
>>
>>
>> Thank you,
>> Csabi
>>
>>
>>
>>

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-03 Thread Kenneth Knowles

Agree that a coder URN defines the encoding. I see that string UTF-8 was
added to the proto enum, but it needs a written spec of the encoding.
Ideally some test data that different languages can use to drive compliance
testing.

Kenn

On Wed, Apr 3, 2019 at 6:21 PM Robert Burke  wrote:

> String UTF8 was recently added as a "standard coder " URN in the protos,
> but I don't think that developed beyond Java, so adding it to Python would
> be reasonable in my opinion.
>
> The Go SDK handles Strings as "custom coders" presently which for Go are
> always length prefixed (and reported to the Runner as LP+CustomCoder). It
> would be straight forward to add the correct handling for strings, as Go
> natively treats strings as UTF8.
>
>
> On Wed, Apr 3, 2019, 5:03 PM Heejong Lee  wrote:
>
>> Hi all,
>>
>> It looks like UTF-8 String Coder in Java and Python SDKs uses different
>> encoding schemes. StringUtf8Coder in Java SDK puts the varint length of the
>> input string before actual data bytes however StrUtf8Coder in Python SDK
>> directly encodes the input string to bytes value. For the last few weeks,
>> I've been testing and fixing cross-language IO transforms and this
>> discrepancy is a major blocker for me. IMO, we should unify the encoding
>> schemes of UTF8 strings across the different SDKs and make it a standard
>> coder. Any thoughts?
>>
>> Thanks,
>>
>

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-04-03 Thread Kenneth Knowles

This all makes me think that we should rethink how we ship experimental
features. My experience is also that (1) users don't know if something is
experimental or don't think hard about it and (2) we don't use experimental
time period to gather feedback and make changes.

How can we change both of these? Perhaps we could require experimental
features to be opt-in. Flags work and also clearly marked experimental
dependencies that a user has to add. Changing the core is sometimes tricky
to put behind a flag but rarely impossible. This way a contributor is also
motivated to gather feedback to mature their feature to become default
instead of opt-in.

The need that @Experimental was trying to address is real. We *do* need a
way to try things and get feedback prior to committing to forever support.
We have discovered real problems far too late, or not had the will to fix
the issue we did find:
 - many trigger combinators should probably be deleted
 - many triggers cannot meet a good spec with merging windows
 - the continuation trigger idea doesn't work well
 - CombineFn had to have its spec changed in order to be both correct and
efficient
 - OutputTimeFn as a UDF is convenient for Java but it turns out an enum is
better for portability
 - Coder contexts turned out to be a major usability problem
 - The built-in data types for schemas are evolving (luckily these are
really being worked on!)

That's just what I can think of off the top of my head. I expect the
examples from IOs are more numerous; in that case it is pretty easy to fork
and make a new and better IO.

And as an extreme view, I would prefer if we add a deadline for
experimental features, then our default action is to remove them, not
declare them stable. If noone is trying to mature it and get it out of
opt-in status, then it probably has not matured. And perhaps if noone care
enough to do that work it also isn't that important.

Kenn

On Wed, Apr 3, 2019 at 5:57 PM Ahmet Altay  wrote:

> I agree with Reuven that our experimental annotation is not useful any
> more. For example Datastore IO in python sdk is experimental for 2 years
> now. Even though it is marked as experimental an upgrade is carefully
> planned [1] as if it is not experimental. Given that I do not think we can
> remove features within a small number of minor releases. (Exception to this
> would be, if we have a clear knowledge of very low usage of a certain IO.)
>
> I am worried that tagging experimental features with release versions will
> add toil to the release process as mentioned and will also add to the user
> confusion. What would be the signal to a user if they see an experimental
> feature target release bumped between releases? How about tagging
> experimental features with JIRAs (similar to TODOs) with an action to
> either promote them as supported features or remove them? These JIRAs could
> have fix version targets as any other release blocking JIRAs. It will also
> clarify who is responsible for a given experimental feature.
>
> [1]
> https://lists.apache.org/thread.html/5ec88967aa4a382db07a60e0101c4eb36165909076867155ab3546a6@%3Cdev.beam.apache.org%3E
>
> On Wed, Apr 3, 2019 at 5:24 PM Reuven Lax  wrote:
>
>> Experiments are already tagged with a Kind enum
>> (e.g. @Experimental(Kind.Schemas)).
>>
>
> This not the case for python's annotations. It will be a good idea to add
> there as well.
>
>
>>
>> On Wed, Apr 3, 2019 at 4:56 PM Ankur Goenka  wrote:
>>
>>> I think a release version with Experimental flag makes sense.
>>> In addition, I think many of our user start to rely on experimental
>>> features because they are not even aware that these features are
>>> experimental and its really hard to find the experimental features used
>>> without giving a good look at the Beam code and having some knowledge about
>>> it.
>>>
>>> It will be good it we can have a step at the pipeline submission time
>>> which can print all the experiments used in verbose mode. This might also
>>> require to add a meaningful group name for the experiment example
>>>
>>> @Experimental("SDF", 2.15.0)
>>>
>>> This will of-course add additional effort and require additional context
>>> while tagging experiments.
>>>
>>> On Wed, Apr 3, 2019 at 4:43 PM Reuven Lax  wrote:
>>>
 Our Experimental annotation has become almost useless. Many core,
 widely-used parts of the API (e.g. triggers) are still all marked as
 experimental. So many users use these features that we couldn't really
 change them (in a backwards-incompatible) without hurting many users, so
 the fact they are marked Experimental has become a fiction.

 Could we add a deadline to the Experimental tag - a release version
 when it will be removed? e.g.

 @Experimental(2.15.0)

 We can have a test that ensure that the tag is removed at this version.
 Of course if we're not ready to remove experimental by that version, it's
 fine - we can always bump the tagged version. However thi

Changes in Beam Jenkins Agents

2019-04-03 Thread Yifan Zou

Hi,

Our Jenkins are in a bad condition. 8 agents are down at this time, and
they are not going to be restored because of some bad errors happened on
the Puppet server due to the consistent re-provisioning. There are several
discussions in the recent weeks between the ASF Infra and us. In general,
the Infra team will move away from puppet third-party build nodes, and we
will need to manage the agents by ourselves.

I wrote a one pager to describe the problem and the approach. We created a
new Jenkins node (https://builds.apache.org/computer/beam17-jnlp/) for
experimental purpose. We're making efforts to verify the environment by
running all beam tests on it that make sure the required tools are
installed properly. The process is tracked in the attached spreadsheet. Any
helps on verifying tests on the new machine are appreciated! The
instructions is on the top of the spreadsheet.

One pager:
https://docs.google.com/document/d/1c38IPrF94PZC-ItGZgmAgAKrgmC1MGA6N6nkK0cL6L4/edit?ts=5ca54b3e#heading=h.lm27uybdtpys

Verification tracking sheet:
https://docs.google.com/spreadsheets/d/1MDL6vy_0iaFSZeWQ-4JWKlRiZ5WFdDVjJh6Xvczgld0/edit?ts=5ca54b2d#gid=0

Thanks.

Regards.
Yifan

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-03 Thread Robert Burke

String UTF8 was recently added as a "standard coder " URN in the protos,
but I don't think that developed beyond Java, so adding it to Python would
be reasonable in my opinion.

The Go SDK handles Strings as "custom coders" presently which for Go are
always length prefixed (and reported to the Runner as LP+CustomCoder). It
would be straight forward to add the correct handling for strings, as Go
natively treats strings as UTF8.

On Wed, Apr 3, 2019, 5:03 PM Heejong Lee  wrote:

> Hi all,
>
> It looks like UTF-8 String Coder in Java and Python SDKs uses different
> encoding schemes. StringUtf8Coder in Java SDK puts the varint length of the
> input string before actual data bytes however StrUtf8Coder in Python SDK
> directly encodes the input string to bytes value. For the last few weeks,
> I've been testing and fixing cross-language IO transforms and this
> discrepancy is a major blocker for me. IMO, we should unify the encoding
> schemes of UTF8 strings across the different SDKs and make it a standard
> coder. Any thoughts?
>
> Thanks,
>

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-04-03 Thread Ahmet Altay

I agree with Reuven that our experimental annotation is not useful any
more. For example Datastore IO in python sdk is experimental for 2 years
now. Even though it is marked as experimental an upgrade is carefully
planned [1] as if it is not experimental. Given that I do not think we can
remove features within a small number of minor releases. (Exception to this
would be, if we have a clear knowledge of very low usage of a certain IO.)

I am worried that tagging experimental features with release versions will
add toil to the release process as mentioned and will also add to the user
confusion. What would be the signal to a user if they see an experimental
feature target release bumped between releases? How about tagging
experimental features with JIRAs (similar to TODOs) with an action to
either promote them as supported features or remove them? These JIRAs could
have fix version targets as any other release blocking JIRAs. It will also
clarify who is responsible for a given experimental feature.

[1]
https://lists.apache.org/thread.html/5ec88967aa4a382db07a60e0101c4eb36165909076867155ab3546a6@%3Cdev.beam.apache.org%3E

On Wed, Apr 3, 2019 at 5:24 PM Reuven Lax  wrote:

> Experiments are already tagged with a Kind enum
> (e.g. @Experimental(Kind.Schemas)).
>

This not the case for python's annotations. It will be a good idea to add
there as well.


>
> On Wed, Apr 3, 2019 at 4:56 PM Ankur Goenka  wrote:
>
>> I think a release version with Experimental flag makes sense.
>> In addition, I think many of our user start to rely on experimental
>> features because they are not even aware that these features are
>> experimental and its really hard to find the experimental features used
>> without giving a good look at the Beam code and having some knowledge about
>> it.
>>
>> It will be good it we can have a step at the pipeline submission time
>> which can print all the experiments used in verbose mode. This might also
>> require to add a meaningful group name for the experiment example
>>
>> @Experimental("SDF", 2.15.0)
>>
>> This will of-course add additional effort and require additional context
>> while tagging experiments.
>>
>> On Wed, Apr 3, 2019 at 4:43 PM Reuven Lax  wrote:
>>
>>> Our Experimental annotation has become almost useless. Many core,
>>> widely-used parts of the API (e.g. triggers) are still all marked as
>>> experimental. So many users use these features that we couldn't really
>>> change them (in a backwards-incompatible) without hurting many users, so
>>> the fact they are marked Experimental has become a fiction.
>>>
>>> Could we add a deadline to the Experimental tag - a release version when
>>> it will be removed? e.g.
>>>
>>> @Experimental(2.15.0)
>>>
>>> We can have a test that ensure that the tag is removed at this version.
>>> Of course if we're not ready to remove experimental by that version, it's
>>> fine - we can always bump the tagged version. However this forces us to
>>> think about each one.
>>>
>>> Downside - it might add more toil to the existing release process.
>>>
>>> Reuven
>>>
>>>
>>> On Wed, Apr 3, 2019 at 4:00 PM Kyle Weaver  wrote:
>>>
 > We might also want to get in the habit of reviewing if something
 should no longer be experimental.

 +1

 Kyle Weaver |  Software Engineer |  kcwea...@google.com |  +1650203


 On Wed, Apr 3, 2019 at 3:53 PM Kenneth Knowles  wrote:

> I think option 2 with n=1 minor version seems OK. So users get the
> message for one release and it is gone the next. We should make sure the
> deprecation warning says "this is an experimental feature, so it will be
> removed after 1 minor version". And we need a process for doing it so it
> doesn't sit around. I think we should also leave room for using our own
> judgment about whether the user pain is very little and then it is not
> needed to have a deprecation cycle.
>
> We might also want to get in the habit of reviewing if something
> should no longer be experimental.
>
> Kenn
>
> On Wed, Apr 3, 2019 at 2:33 PM Ismaël Mejía  wrote:
>
>> When we did the first stable release of Beam (2.0.0) we decided to
>> annotate most of the Beam IOs as @Experimental because we were
>> cautious about not getting the APIs right in the first try. This was a
>> really good decision because we could do serious improvements and
>> refactorings to them in the first releases without the hassle of
>> keeping backwards compatibility. However after some more releases
>> users started to rely on features and supported versions, so we ended
>> up in a situation where we could not change them arbitrarily without
>> consequences to the final users.
>>
>> So we started to deprecate some features and parts of the API without
>> removing them, e.g. the introduction of HadoopFormatIO deprecated
>> HadoopInputFormatIO, we deprecated methods of MongoDbIO and MqttIO to

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-04-03 Thread Reuven Lax

Experiments are already tagged with a Kind enum
(e.g. @Experimental(Kind.Schemas)).

On Wed, Apr 3, 2019 at 4:56 PM Ankur Goenka  wrote:

> I think a release version with Experimental flag makes sense.
> In addition, I think many of our user start to rely on experimental
> features because they are not even aware that these features are
> experimental and its really hard to find the experimental features used
> without giving a good look at the Beam code and having some knowledge about
> it.
>
> It will be good it we can have a step at the pipeline submission time
> which can print all the experiments used in verbose mode. This might also
> require to add a meaningful group name for the experiment example
>
> @Experimental("SDF", 2.15.0)
>
> This will of-course add additional effort and require additional context
> while tagging experiments.
>
> On Wed, Apr 3, 2019 at 4:43 PM Reuven Lax  wrote:
>
>> Our Experimental annotation has become almost useless. Many core,
>> widely-used parts of the API (e.g. triggers) are still all marked as
>> experimental. So many users use these features that we couldn't really
>> change them (in a backwards-incompatible) without hurting many users, so
>> the fact they are marked Experimental has become a fiction.
>>
>> Could we add a deadline to the Experimental tag - a release version when
>> it will be removed? e.g.
>>
>> @Experimental(2.15.0)
>>
>> We can have a test that ensure that the tag is removed at this version.
>> Of course if we're not ready to remove experimental by that version, it's
>> fine - we can always bump the tagged version. However this forces us to
>> think about each one.
>>
>> Downside - it might add more toil to the existing release process.
>>
>> Reuven
>>
>>
>> On Wed, Apr 3, 2019 at 4:00 PM Kyle Weaver  wrote:
>>
>>> > We might also want to get in the habit of reviewing if something
>>> should no longer be experimental.
>>>
>>> +1
>>>
>>> Kyle Weaver |  Software Engineer |  kcwea...@google.com |  +1650203
>>>
>>>
>>> On Wed, Apr 3, 2019 at 3:53 PM Kenneth Knowles  wrote:
>>>
 I think option 2 with n=1 minor version seems OK. So users get the
 message for one release and it is gone the next. We should make sure the
 deprecation warning says "this is an experimental feature, so it will be
 removed after 1 minor version". And we need a process for doing it so it
 doesn't sit around. I think we should also leave room for using our own
 judgment about whether the user pain is very little and then it is not
 needed to have a deprecation cycle.

 We might also want to get in the habit of reviewing if something should
 no longer be experimental.

 Kenn

 On Wed, Apr 3, 2019 at 2:33 PM Ismaël Mejía  wrote:

> When we did the first stable release of Beam (2.0.0) we decided to
> annotate most of the Beam IOs as @Experimental because we were
> cautious about not getting the APIs right in the first try. This was a
> really good decision because we could do serious improvements and
> refactorings to them in the first releases without the hassle of
> keeping backwards compatibility. However after some more releases
> users started to rely on features and supported versions, so we ended
> up in a situation where we could not change them arbitrarily without
> consequences to the final users.
>
> So we started to deprecate some features and parts of the API without
> removing them, e.g. the introduction of HadoopFormatIO deprecated
> HadoopInputFormatIO, we deprecated methods of MongoDbIO and MqttIO to
> improve the APIs (in most cases with valid/improved replacements), and
> recently it was discussed to removal of support for older versions in
> KafkaIO.
>
> Keeping deprecated stuff in experimental APIs does not seem to make
> sense, but it is what he have started to do to be ‘user friendly’, but
> it is probably a good moment to define, what should be the clear path
> for removal and breaking changes of experimental features, some
> options:
>
> 1. Stay as we were, do not mark things as deprecated and remove them
> at will because this is the contract of @Experimental.
> 2. Deprecate stuff and remove it after n versions (where n could be 3
> releases).
> 3. Deprecate stuff and remove it just after a new LTS is decided to
> ensure users who need these features may still have them for some
> time.
>
> I would like to know your opinions about this, or if you have other
> ideas. Notice that in discussion I refer only to @Experimental
> features.
>

[DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-03 Thread Heejong Lee

Hi all,

It looks like UTF-8 String Coder in Java and Python SDKs uses different
encoding schemes. StringUtf8Coder in Java SDK puts the varint length of the
input string before actual data bytes however StrUtf8Coder in Python SDK
directly encodes the input string to bytes value. For the last few weeks,
I've been testing and fixing cross-language IO transforms and this
discrepancy is a major blocker for me. IMO, we should unify the encoding
schemes of UTF8 strings across the different SDKs and make it a standard
coder. Any thoughts?

Thanks,

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-04-03 Thread Ankur Goenka

I think a release version with Experimental flag makes sense.
In addition, I think many of our user start to rely on experimental
features because they are not even aware that these features are
experimental and its really hard to find the experimental features used
without giving a good look at the Beam code and having some knowledge about
it.

It will be good it we can have a step at the pipeline submission time which
can print all the experiments used in verbose mode. This might also require
to add a meaningful group name for the experiment example

@Experimental("SDF", 2.15.0)

This will of-course add additional effort and require additional context
while tagging experiments.

On Wed, Apr 3, 2019 at 4:43 PM Reuven Lax  wrote:

> Our Experimental annotation has become almost useless. Many core,
> widely-used parts of the API (e.g. triggers) are still all marked as
> experimental. So many users use these features that we couldn't really
> change them (in a backwards-incompatible) without hurting many users, so
> the fact they are marked Experimental has become a fiction.
>
> Could we add a deadline to the Experimental tag - a release version when
> it will be removed? e.g.
>
> @Experimental(2.15.0)
>
> We can have a test that ensure that the tag is removed at this version. Of
> course if we're not ready to remove experimental by that version, it's fine
> - we can always bump the tagged version. However this forces us to think
> about each one.
>
> Downside - it might add more toil to the existing release process.
>
> Reuven
>
>
> On Wed, Apr 3, 2019 at 4:00 PM Kyle Weaver  wrote:
>
>> > We might also want to get in the habit of reviewing if something should
>> no longer be experimental.
>>
>> +1
>>
>> Kyle Weaver |  Software Engineer |  kcwea...@google.com |  +1650203
>>
>>
>> On Wed, Apr 3, 2019 at 3:53 PM Kenneth Knowles  wrote:
>>
>>> I think option 2 with n=1 minor version seems OK. So users get the
>>> message for one release and it is gone the next. We should make sure the
>>> deprecation warning says "this is an experimental feature, so it will be
>>> removed after 1 minor version". And we need a process for doing it so it
>>> doesn't sit around. I think we should also leave room for using our own
>>> judgment about whether the user pain is very little and then it is not
>>> needed to have a deprecation cycle.
>>>
>>> We might also want to get in the habit of reviewing if something should
>>> no longer be experimental.
>>>
>>> Kenn
>>>
>>> On Wed, Apr 3, 2019 at 2:33 PM Ismaël Mejía  wrote:
>>>
 When we did the first stable release of Beam (2.0.0) we decided to
 annotate most of the Beam IOs as @Experimental because we were
 cautious about not getting the APIs right in the first try. This was a
 really good decision because we could do serious improvements and
 refactorings to them in the first releases without the hassle of
 keeping backwards compatibility. However after some more releases
 users started to rely on features and supported versions, so we ended
 up in a situation where we could not change them arbitrarily without
 consequences to the final users.

 So we started to deprecate some features and parts of the API without
 removing them, e.g. the introduction of HadoopFormatIO deprecated
 HadoopInputFormatIO, we deprecated methods of MongoDbIO and MqttIO to
 improve the APIs (in most cases with valid/improved replacements), and
 recently it was discussed to removal of support for older versions in
 KafkaIO.

 Keeping deprecated stuff in experimental APIs does not seem to make
 sense, but it is what he have started to do to be ‘user friendly’, but
 it is probably a good moment to define, what should be the clear path
 for removal and breaking changes of experimental features, some
 options:

 1. Stay as we were, do not mark things as deprecated and remove them
 at will because this is the contract of @Experimental.
 2. Deprecate stuff and remove it after n versions (where n could be 3
 releases).
 3. Deprecate stuff and remove it just after a new LTS is decided to
 ensure users who need these features may still have them for some
 time.

 I would like to know your opinions about this, or if you have other
 ideas. Notice that in discussion I refer only to @Experimental
 features.

>>>

Re: kafka 0.9 support

2019-04-03 Thread Raghu Angadi

I mean, +1 for removing support for old Kafka versions after next LTS

What the cut off should be for 'old' version is can be discussed then. My
choice would be 0.11.
Raghu.

On Wed, Apr 3, 2019 at 4:36 PM Raghu Angadi  wrote:

> +1 for next LTS.
>
> On Wed, Apr 3, 2019 at 2:30 PM Ismaël Mejía  wrote:
>
>> We should focus on the main reason to remove the Kafka 0.9 support. I
>> have the impression that this is mostly to ease the maintenance, but
>> from the current status (and the removal PR [1]), it does not seem
>> like it is a burden to continue supporting 0.9. In any case I am +1 to
>> remove the support for 0.9, but maybe it is a good idea to just wait
>> until the next LTS is decided and do it just after. This way we will
>> still cover existing users for some time.
>>
>> Creating different modules for different versions of KafkaIO does not
>> make sense because it is even more complicated than just staying the
>> way we are today for not much in return. We better improve the status
>> quo by parametrizing our current tests to validate that KafkaIO works
>> correctly with the different supported versions (so far we only test
>> against version 1.0.0). I filled BEAM-7003 to track this.
>>
>> [1] https://github.com/apache/beam/pull/8186
>> [2] https://issues.apache.org/jira/browse/BEAM-7003
>>
>> ps. Actually this discussion brings to the table the issue of
>> removing/deprecated/changing supported versions on parts of the API
>> marked as @Experimental. I will fork a new thread to discuss this.
>>
>> On Wed, Apr 3, 2019 at 6:53 PM Raghu Angadi  wrote:
>> >
>> >
>> >
>> > On Wed, Apr 3, 2019 at 5:46 AM David Morávek 
>> wrote:
>> >>
>> >> I'd say that APIs we use in KafkaIO are pretty much stable since 0.10
>> release, all reflection based compatibility adapters seem to be aimed for
>> 0.9 release (which is 8 major releases behind current Kafka release).
>> >>
>> >> We may take an inspiration from Flink's kafka connector, they maintain
>> separate maven artifact for all supported Kafka APIs. This may be the best
>> approach as we can still share most of the codebase between versions, have
>> compile time checks and also run tests against all of the supported
>> versions.
>> >
>> >
>> > From that page, Flink also moved to single Kafka connector for versions
>> 10.x and newer. Kafka itself seems to have improved compatibility between
>> client and broker versions starting 0.11. Not sure if there is any need now
>> to make multiple versions of KafkaIO versions for 0.9.x etc. Are you
>> suggesting we should?
>> >
>> > From Flink's page:
>> > "Starting with Flink 1.7, there is a new universal Kafka connector that
>> does not track a specific Kafka major version. Rather, it tracks the latest
>> version of Kafka at the time of the Flink release.
>> >
>> > If your Kafka broker version is 1.0.0 or newer, you should use this
>> Kafka connector. If you use an older version of Kafka (0.11, 0.10, 0.9, or
>> 0.8), you should use the connector corresponding to the broker version."
>> >
>> >
>> >>
>> >>
>> >> I'm not really comfortable with reflection based adapters as they seem
>> fragile and don't provide compile time checks.
>> >>
>> >> On Tue, Apr 2, 2019 at 11:27 PM Austin Bennett <
>> whatwouldausti...@gmail.com> wrote:
>> >>>
>> >>> I withdraw my concern -- checked on info on the cluster I will
>> eventually access.  It is on 0.8, so I was speaking too soon.  Can't speak
>> to rest of user base.
>> >>>
>> >>> On Tue, Apr 2, 2019 at 11:03 AM Raghu Angadi 
>> wrote:
>> 
>>  Thanks to David Morávek for pointing out possible improvement to
>> KafkaIO for dropping support for 0.9 since it avoids having a second
>> consumer just to fetch latest offsets for backlog.
>> 
>>  Ideally we should be dropping 0.9 support for next major release, in
>> fact better to drop versions before 0.10.1 at the same time. This would
>> further reduce reflection based calls for supporting multiple versions. If
>> the users still on 0.9 could stay on current stable release of Beam,
>> dropping would not affect them. Otherwise, it would be good to hear from
>> them about how long we need to keep support for old versions.
>> 
>>  I don't think it is good idea to have multiple forks of KafkaIO in
>> the same repo. If we do go that route, we should fork the entire kafka
>> directory and rename the main class KafkaIO_Unmaintained :).
>> 
>>  IMHO, so far, additional complexity for supporting these versions is
>> not that bad. Most of it is isolated to ConsumerSpEL.java &
>> ProducerSpEL.java.
>>  My first preference is dropping support for deprecated versions (and
>> a deprecate a few more versions, may be till the version that added
>> transactions around 0.11.x I think).
>> 
>>  I haven't looked into what's new in Kafka 2.x. Are there any
>> features that KafkaIO should take advantage of? I have not noticed our
>> existing code breaking. We should certainly certainly support latest
>> releases of

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-04-03 Thread Reuven Lax

Our Experimental annotation has become almost useless. Many core,
widely-used parts of the API (e.g. triggers) are still all marked as
experimental. So many users use these features that we couldn't really
change them (in a backwards-incompatible) without hurting many users, so
the fact they are marked Experimental has become a fiction.

Could we add a deadline to the Experimental tag - a release version when it
will be removed? e.g.

@Experimental(2.15.0)

We can have a test that ensure that the tag is removed at this version. Of
course if we're not ready to remove experimental by that version, it's fine
- we can always bump the tagged version. However this forces us to think
about each one.

Downside - it might add more toil to the existing release process.

Reuven


On Wed, Apr 3, 2019 at 4:00 PM Kyle Weaver  wrote:

> > We might also want to get in the habit of reviewing if something should
> no longer be experimental.
>
> +1
>
> Kyle Weaver |  Software Engineer |  kcwea...@google.com |  +1650203
>
>
> On Wed, Apr 3, 2019 at 3:53 PM Kenneth Knowles  wrote:
>
>> I think option 2 with n=1 minor version seems OK. So users get the
>> message for one release and it is gone the next. We should make sure the
>> deprecation warning says "this is an experimental feature, so it will be
>> removed after 1 minor version". And we need a process for doing it so it
>> doesn't sit around. I think we should also leave room for using our own
>> judgment about whether the user pain is very little and then it is not
>> needed to have a deprecation cycle.
>>
>> We might also want to get in the habit of reviewing if something should
>> no longer be experimental.
>>
>> Kenn
>>
>> On Wed, Apr 3, 2019 at 2:33 PM Ismaël Mejía  wrote:
>>
>>> When we did the first stable release of Beam (2.0.0) we decided to
>>> annotate most of the Beam IOs as @Experimental because we were
>>> cautious about not getting the APIs right in the first try. This was a
>>> really good decision because we could do serious improvements and
>>> refactorings to them in the first releases without the hassle of
>>> keeping backwards compatibility. However after some more releases
>>> users started to rely on features and supported versions, so we ended
>>> up in a situation where we could not change them arbitrarily without
>>> consequences to the final users.
>>>
>>> So we started to deprecate some features and parts of the API without
>>> removing them, e.g. the introduction of HadoopFormatIO deprecated
>>> HadoopInputFormatIO, we deprecated methods of MongoDbIO and MqttIO to
>>> improve the APIs (in most cases with valid/improved replacements), and
>>> recently it was discussed to removal of support for older versions in
>>> KafkaIO.
>>>
>>> Keeping deprecated stuff in experimental APIs does not seem to make
>>> sense, but it is what he have started to do to be ‘user friendly’, but
>>> it is probably a good moment to define, what should be the clear path
>>> for removal and breaking changes of experimental features, some
>>> options:
>>>
>>> 1. Stay as we were, do not mark things as deprecated and remove them
>>> at will because this is the contract of @Experimental.
>>> 2. Deprecate stuff and remove it after n versions (where n could be 3
>>> releases).
>>> 3. Deprecate stuff and remove it just after a new LTS is decided to
>>> ensure users who need these features may still have them for some
>>> time.
>>>
>>> I would like to know your opinions about this, or if you have other
>>> ideas. Notice that in discussion I refer only to @Experimental
>>> features.
>>>
>>

Re: kafka 0.9 support

2019-04-03 Thread Raghu Angadi

+1 for next LTS.

On Wed, Apr 3, 2019 at 2:30 PM Ismaël Mejía  wrote:

> We should focus on the main reason to remove the Kafka 0.9 support. I
> have the impression that this is mostly to ease the maintenance, but
> from the current status (and the removal PR [1]), it does not seem
> like it is a burden to continue supporting 0.9. In any case I am +1 to
> remove the support for 0.9, but maybe it is a good idea to just wait
> until the next LTS is decided and do it just after. This way we will
> still cover existing users for some time.
>
> Creating different modules for different versions of KafkaIO does not
> make sense because it is even more complicated than just staying the
> way we are today for not much in return. We better improve the status
> quo by parametrizing our current tests to validate that KafkaIO works
> correctly with the different supported versions (so far we only test
> against version 1.0.0). I filled BEAM-7003 to track this.
>
> [1] https://github.com/apache/beam/pull/8186
> [2] https://issues.apache.org/jira/browse/BEAM-7003
>
> ps. Actually this discussion brings to the table the issue of
> removing/deprecated/changing supported versions on parts of the API
> marked as @Experimental. I will fork a new thread to discuss this.
>
> On Wed, Apr 3, 2019 at 6:53 PM Raghu Angadi  wrote:
> >
> >
> >
> > On Wed, Apr 3, 2019 at 5:46 AM David Morávek 
> wrote:
> >>
> >> I'd say that APIs we use in KafkaIO are pretty much stable since 0.10
> release, all reflection based compatibility adapters seem to be aimed for
> 0.9 release (which is 8 major releases behind current Kafka release).
> >>
> >> We may take an inspiration from Flink's kafka connector, they maintain
> separate maven artifact for all supported Kafka APIs. This may be the best
> approach as we can still share most of the codebase between versions, have
> compile time checks and also run tests against all of the supported
> versions.
> >
> >
> > From that page, Flink also moved to single Kafka connector for versions
> 10.x and newer. Kafka itself seems to have improved compatibility between
> client and broker versions starting 0.11. Not sure if there is any need now
> to make multiple versions of KafkaIO versions for 0.9.x etc. Are you
> suggesting we should?
> >
> > From Flink's page:
> > "Starting with Flink 1.7, there is a new universal Kafka connector that
> does not track a specific Kafka major version. Rather, it tracks the latest
> version of Kafka at the time of the Flink release.
> >
> > If your Kafka broker version is 1.0.0 or newer, you should use this
> Kafka connector. If you use an older version of Kafka (0.11, 0.10, 0.9, or
> 0.8), you should use the connector corresponding to the broker version."
> >
> >
> >>
> >>
> >> I'm not really comfortable with reflection based adapters as they seem
> fragile and don't provide compile time checks.
> >>
> >> On Tue, Apr 2, 2019 at 11:27 PM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
> >>>
> >>> I withdraw my concern -- checked on info on the cluster I will
> eventually access.  It is on 0.8, so I was speaking too soon.  Can't speak
> to rest of user base.
> >>>
> >>> On Tue, Apr 2, 2019 at 11:03 AM Raghu Angadi  wrote:
> 
>  Thanks to David Morávek for pointing out possible improvement to
> KafkaIO for dropping support for 0.9 since it avoids having a second
> consumer just to fetch latest offsets for backlog.
> 
>  Ideally we should be dropping 0.9 support for next major release, in
> fact better to drop versions before 0.10.1 at the same time. This would
> further reduce reflection based calls for supporting multiple versions. If
> the users still on 0.9 could stay on current stable release of Beam,
> dropping would not affect them. Otherwise, it would be good to hear from
> them about how long we need to keep support for old versions.
> 
>  I don't think it is good idea to have multiple forks of KafkaIO in
> the same repo. If we do go that route, we should fork the entire kafka
> directory and rename the main class KafkaIO_Unmaintained :).
> 
>  IMHO, so far, additional complexity for supporting these versions is
> not that bad. Most of it is isolated to ConsumerSpEL.java &
> ProducerSpEL.java.
>  My first preference is dropping support for deprecated versions (and
> a deprecate a few more versions, may be till the version that added
> transactions around 0.11.x I think).
> 
>  I haven't looked into what's new in Kafka 2.x. Are there any features
> that KafkaIO should take advantage of? I have not noticed our existing code
> breaking. We should certainly certainly support latest releases of Kafka.
> 
>  Raghu.
> 
>  On Tue, Apr 2, 2019 at 10:27 AM Mingmin Xu 
> wrote:
> >
> >
> > We're still using Kafka 0.10 a lot, similar as 0.9 IMO. To expand
> multiple versions in KafkaIO is quite complex now, and it confuses users
> which is supported / which is not. I would prefer to support Kafka 2.0+
>

Re: [VOTE] Release 2.12.0, release candidate #1

2019-04-03 Thread Kenneth Knowles

I suggest keeping the bug open until the cherry-pick is complete. That
makes tracking the burndown easier and is more accurate treatment of Fix
Version.

And from the other direction, a good practice is to check not only the Jira
burndown [1] and also search for pull requests targeting the release branch
[2].

I normally wouldn't say anything since this is micro process, but building
an RC can take some time so it is worth putting extra effort into pre-RC
steps.

Kenn

[1] https://issues.apache.org/jira/projects/BEAM/versions/12344944
[2] https://github.com/apache/beam/pulls?q=is:open+base:release-2.12.0

On Wed, Apr 3, 2019 at 12:39 PM Ismaël Mejía  wrote:

> -1
>
> The release misses a cherry pick [1] that fixes an important issue in
> Cassandra, without this users won't be able to write to Cassandra. I
> know at least 3 users who are waiting for this release to have this
> fixed.
>
> [1] https://github.com/apache/beam/pull/8198/files
>
> On Wed, Apr 3, 2019 at 8:34 PM Andrew Pilloud  wrote:
> >
> > Hi everyone,
> >
> > Please review and vote on the release candidate #1 for the version
> 2.12.0, as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint
> 9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v2.12.0-RC1" [5],
> > * website pull request listing the release [6] and publishing the API
> reference manual [7].
> > * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> > * Validation sheet with a tab for 2.12.0 release to help with validation
> [8].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Andrew
> >
> > [1]
> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12344944
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1065/
> > [5] https://github.com/apache/beam/tree/v2.12.0-RC1
> > [6] https://github.com/apache/beam/pull/8215
> > [7] https://github.com/apache/beam-site/pull/588
> > [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984
>

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-04-03 Thread Kyle Weaver

> We might also want to get in the habit of reviewing if something should
no longer be experimental.

+1

Kyle Weaver |  Software Engineer |  kcwea...@google.com |  +1650203


On Wed, Apr 3, 2019 at 3:53 PM Kenneth Knowles  wrote:

> I think option 2 with n=1 minor version seems OK. So users get the message
> for one release and it is gone the next. We should make sure the
> deprecation warning says "this is an experimental feature, so it will be
> removed after 1 minor version". And we need a process for doing it so it
> doesn't sit around. I think we should also leave room for using our own
> judgment about whether the user pain is very little and then it is not
> needed to have a deprecation cycle.
>
> We might also want to get in the habit of reviewing if something should no
> longer be experimental.
>
> Kenn
>
> On Wed, Apr 3, 2019 at 2:33 PM Ismaël Mejía  wrote:
>
>> When we did the first stable release of Beam (2.0.0) we decided to
>> annotate most of the Beam IOs as @Experimental because we were
>> cautious about not getting the APIs right in the first try. This was a
>> really good decision because we could do serious improvements and
>> refactorings to them in the first releases without the hassle of
>> keeping backwards compatibility. However after some more releases
>> users started to rely on features and supported versions, so we ended
>> up in a situation where we could not change them arbitrarily without
>> consequences to the final users.
>>
>> So we started to deprecate some features and parts of the API without
>> removing them, e.g. the introduction of HadoopFormatIO deprecated
>> HadoopInputFormatIO, we deprecated methods of MongoDbIO and MqttIO to
>> improve the APIs (in most cases with valid/improved replacements), and
>> recently it was discussed to removal of support for older versions in
>> KafkaIO.
>>
>> Keeping deprecated stuff in experimental APIs does not seem to make
>> sense, but it is what he have started to do to be ‘user friendly’, but
>> it is probably a good moment to define, what should be the clear path
>> for removal and breaking changes of experimental features, some
>> options:
>>
>> 1. Stay as we were, do not mark things as deprecated and remove them
>> at will because this is the contract of @Experimental.
>> 2. Deprecate stuff and remove it after n versions (where n could be 3
>> releases).
>> 3. Deprecate stuff and remove it just after a new LTS is decided to
>> ensure users who need these features may still have them for some
>> time.
>>
>> I would like to know your opinions about this, or if you have other
>> ideas. Notice that in discussion I refer only to @Experimental
>> features.
>>
>

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-04-03 Thread Kenneth Knowles

I think option 2 with n=1 minor version seems OK. So users get the message
for one release and it is gone the next. We should make sure the
deprecation warning says "this is an experimental feature, so it will be
removed after 1 minor version". And we need a process for doing it so it
doesn't sit around. I think we should also leave room for using our own
judgment about whether the user pain is very little and then it is not
needed to have a deprecation cycle.

We might also want to get in the habit of reviewing if something should no
longer be experimental.

Kenn

On Wed, Apr 3, 2019 at 2:33 PM Ismaël Mejía  wrote:

> When we did the first stable release of Beam (2.0.0) we decided to
> annotate most of the Beam IOs as @Experimental because we were
> cautious about not getting the APIs right in the first try. This was a
> really good decision because we could do serious improvements and
> refactorings to them in the first releases without the hassle of
> keeping backwards compatibility. However after some more releases
> users started to rely on features and supported versions, so we ended
> up in a situation where we could not change them arbitrarily without
> consequences to the final users.
>
> So we started to deprecate some features and parts of the API without
> removing them, e.g. the introduction of HadoopFormatIO deprecated
> HadoopInputFormatIO, we deprecated methods of MongoDbIO and MqttIO to
> improve the APIs (in most cases with valid/improved replacements), and
> recently it was discussed to removal of support for older versions in
> KafkaIO.
>
> Keeping deprecated stuff in experimental APIs does not seem to make
> sense, but it is what he have started to do to be ‘user friendly’, but
> it is probably a good moment to define, what should be the clear path
> for removal and breaking changes of experimental features, some
> options:
>
> 1. Stay as we were, do not mark things as deprecated and remove them
> at will because this is the contract of @Experimental.
> 2. Deprecate stuff and remove it after n versions (where n could be 3
> releases).
> 3. Deprecate stuff and remove it just after a new LTS is decided to
> ensure users who need these features may still have them for some
> time.
>
> I would like to know your opinions about this, or if you have other
> ideas. Notice that in discussion I refer only to @Experimental
> features.
>

[DISCUSS] Backwards compatibility of @Experimental features

2019-04-03 Thread Ismaël Mejía

When we did the first stable release of Beam (2.0.0) we decided to
annotate most of the Beam IOs as @Experimental because we were
cautious about not getting the APIs right in the first try. This was a
really good decision because we could do serious improvements and
refactorings to them in the first releases without the hassle of
keeping backwards compatibility. However after some more releases
users started to rely on features and supported versions, so we ended
up in a situation where we could not change them arbitrarily without
consequences to the final users.

So we started to deprecate some features and parts of the API without
removing them, e.g. the introduction of HadoopFormatIO deprecated
HadoopInputFormatIO, we deprecated methods of MongoDbIO and MqttIO to
improve the APIs (in most cases with valid/improved replacements), and
recently it was discussed to removal of support for older versions in
KafkaIO.

Keeping deprecated stuff in experimental APIs does not seem to make
sense, but it is what he have started to do to be ‘user friendly’, but
it is probably a good moment to define, what should be the clear path
for removal and breaking changes of experimental features, some
options:

1. Stay as we were, do not mark things as deprecated and remove them
at will because this is the contract of @Experimental.
2. Deprecate stuff and remove it after n versions (where n could be 3 releases).
3. Deprecate stuff and remove it just after a new LTS is decided to
ensure users who need these features may still have them for some
time.

I would like to know your opinions about this, or if you have other
ideas. Notice that in discussion I refer only to @Experimental
features.

Re: kafka 0.9 support

2019-04-03 Thread Ismaël Mejía

We should focus on the main reason to remove the Kafka 0.9 support. I
have the impression that this is mostly to ease the maintenance, but
from the current status (and the removal PR [1]), it does not seem
like it is a burden to continue supporting 0.9. In any case I am +1 to
remove the support for 0.9, but maybe it is a good idea to just wait
until the next LTS is decided and do it just after. This way we will
still cover existing users for some time.

Creating different modules for different versions of KafkaIO does not
make sense because it is even more complicated than just staying the
way we are today for not much in return. We better improve the status
quo by parametrizing our current tests to validate that KafkaIO works
correctly with the different supported versions (so far we only test
against version 1.0.0). I filled BEAM-7003 to track this.

[1] https://github.com/apache/beam/pull/8186
[2] https://issues.apache.org/jira/browse/BEAM-7003

ps. Actually this discussion brings to the table the issue of
removing/deprecated/changing supported versions on parts of the API
marked as @Experimental. I will fork a new thread to discuss this.

On Wed, Apr 3, 2019 at 6:53 PM Raghu Angadi  wrote:
>
>
>
> On Wed, Apr 3, 2019 at 5:46 AM David Morávek  wrote:
>>
>> I'd say that APIs we use in KafkaIO are pretty much stable since 0.10 
>> release, all reflection based compatibility adapters seem to be aimed for 
>> 0.9 release (which is 8 major releases behind current Kafka release).
>>
>> We may take an inspiration from Flink's kafka connector, they maintain 
>> separate maven artifact for all supported Kafka APIs. This may be the best 
>> approach as we can still share most of the codebase between versions, have 
>> compile time checks and also run tests against all of the supported versions.
>
>
> From that page, Flink also moved to single Kafka connector for versions 10.x 
> and newer. Kafka itself seems to have improved compatibility between client 
> and broker versions starting 0.11. Not sure if there is any need now to make 
> multiple versions of KafkaIO versions for 0.9.x etc. Are you suggesting we 
> should?
>
> From Flink's page:
> "Starting with Flink 1.7, there is a new universal Kafka connector that does 
> not track a specific Kafka major version. Rather, it tracks the latest 
> version of Kafka at the time of the Flink release.
>
> If your Kafka broker version is 1.0.0 or newer, you should use this Kafka 
> connector. If you use an older version of Kafka (0.11, 0.10, 0.9, or 0.8), 
> you should use the connector corresponding to the broker version."
>
>
>>
>>
>> I'm not really comfortable with reflection based adapters as they seem 
>> fragile and don't provide compile time checks.
>>
>> On Tue, Apr 2, 2019 at 11:27 PM Austin Bennett  
>> wrote:
>>>
>>> I withdraw my concern -- checked on info on the cluster I will eventually 
>>> access.  It is on 0.8, so I was speaking too soon.  Can't speak to rest of 
>>> user base.
>>>
>>> On Tue, Apr 2, 2019 at 11:03 AM Raghu Angadi  wrote:

 Thanks to David Morávek for pointing out possible improvement to KafkaIO 
 for dropping support for 0.9 since it avoids having a second consumer just 
 to fetch latest offsets for backlog.

 Ideally we should be dropping 0.9 support for next major release, in fact 
 better to drop versions before 0.10.1 at the same time. This would further 
 reduce reflection based calls for supporting multiple versions. If the 
 users still on 0.9 could stay on current stable release of Beam, dropping 
 would not affect them. Otherwise, it would be good to hear from them about 
 how long we need to keep support for old versions.

 I don't think it is good idea to have multiple forks of KafkaIO in the 
 same repo. If we do go that route, we should fork the entire kafka 
 directory and rename the main class KafkaIO_Unmaintained :).

 IMHO, so far, additional complexity for supporting these versions is not 
 that bad. Most of it is isolated to ConsumerSpEL.java & ProducerSpEL.java.
 My first preference is dropping support for deprecated versions (and a 
 deprecate a few more versions, may be till the version that added 
 transactions around 0.11.x I think).

 I haven't looked into what's new in Kafka 2.x. Are there any features that 
 KafkaIO should take advantage of? I have not noticed our existing code 
 breaking. We should certainly certainly support latest releases of Kafka.

 Raghu.

 On Tue, Apr 2, 2019 at 10:27 AM Mingmin Xu  wrote:
>
>
> We're still using Kafka 0.10 a lot, similar as 0.9 IMO. To expand 
> multiple versions in KafkaIO is quite complex now, and it confuses users 
> which is supported / which is not. I would prefer to support Kafka 2.0+ 
> only in the latest version. For old versions, there're some options:
> 1). document Kafka-Beam support versions, like what we do in Flink

Projects Can Apply Individually for Google Season of Docs

2019-04-03 Thread sharan


Hi All

Initially the ASF as an organisation was planning to apply as a 
mentoring organisation for Google Season of Docs on behalf of all Apache 
projects but if accepted the maximum number of technical writers we 
could allocated is two. Two technical writers would probably not be 
enough to cover the potential demand from all our projects interested in 
participating.


We've received feedback from Google that individual projects can apply. 
I will withdraw the ASF application so that any Apache project 
interested can apply individually for Season of Docs and so have the 
potential of being allocated a technical writer.


Applications for Season of Docs is open now and closes on 23^rd April 
2019. If your project would like to apply then please see the following 
link:


https://developers.google.com/season-of-docs/docs/get-started/

Good luck everyone!

Thanks
Sharan

Re: [VOTE] Release 2.12.0, release candidate #1

2019-04-03 Thread Ismaël Mejía

-1

The release misses a cherry pick [1] that fixes an important issue in
Cassandra, without this users won't be able to write to Cassandra. I
know at least 3 users who are waiting for this release to have this
fixed.

[1] https://github.com/apache/beam/pull/8198/files

On Wed, Apr 3, 2019 at 8:34 PM Andrew Pilloud  wrote:
>
> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 2.12.0, as 
> follows:
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org [2], 
> which is signed with the key with fingerprint 
> 9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.12.0-RC1" [5],
> * website pull request listing the release [6] and publishing the API 
> reference manual [7].
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org [2].
> * Validation sheet with a tab for 2.12.0 release to help with validation [8].
>
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Andrew
>
> [1] 
> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12344944
> [2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1065/
> [5] https://github.com/apache/beam/tree/v2.12.0-RC1
> [6] https://github.com/apache/beam/pull/8215
> [7] https://github.com/apache/beam-site/pull/588
> [8] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984

Re: ParDo Execution Time stat is always 0

2019-04-03 Thread Thomas Weise

I believe this is where the metrics are supplied:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py

git grep process_bundle_msecs   yields results for dataflow worker only

There isn't any test coverage for the Flink runner:

https://github.com/apache/beam/blob/d38645ae8758d834c3e819b715a66dd82c78f6d4/sdks/python/apache_beam/runners/portability/flink_runner_test.py#L181



On Wed, Apr 3, 2019 at 10:45 AM Akshay Balwally  wrote:

> Should have added- I'm using Python sdk, Flink runner
>
> On Wed, Apr 3, 2019 at 10:32 AM Akshay Balwally 
> wrote:
>
>> Hi,
>> I'm hoping to get metrics on the amount of time spent on each operator,
>> so it seams like the stat
>>
>>
>> {organization_specific_prefix}.operator.beam-metric-pardo_execution_time-process_bundle_msecs-v1.gauge.mean
>>
>> would be pretty helpful. But in practice, this stat always shows 0, which
>> I interpret as 0 milliseconds spent per bundle, which can't be correct
>> (other stats show that the operators are running, and timers within the
>> operators show more reasonable times). Is this a known bug?
>>
>>
>> --
>> *Akshay Balwally*
>> Software Engineer
>> 937.271.6469 <+19372716469>
>> [image: Lyft] 
>>
>
>
> --
> *Akshay Balwally*
> Software Engineer
> 937.271.6469 <+19372716469>
> [image: Lyft] 
>

[VOTE] Release 2.12.0, release candidate #1

2019-04-03 Thread Andrew Pilloud

Hi everyone,

Please review and vote on the release candidate #1 for the version 2.12.0,
as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.12.0-RC1" [5],
* website pull request listing the release [6] and publishing the API
reference manual [7].
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.12.0 release to help with validation
[8].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Andrew

[1]
https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12344944
[2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1065/
[5] https://github.com/apache/beam/tree/v2.12.0-RC1
[6] https://github.com/apache/beam/pull/8215
[7] https://github.com/apache/beam-site/pull/588
[8]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984

Re: Increase Portable SDK Harness share of memory?

2019-04-03 Thread Lukasz Cwik

Turns out much of the work was completed to populate and consume the urn +
payloads.

I have deprecated the single "url" field in enviornment with
https://github.com/apache/beam/pull/8213 which will allow us to close of
BEAM-5433.

On Mon, Apr 1, 2019 at 1:48 PM Lukasz Cwik  wrote:

> Yes, need to use the new fields everywhere and then deprecate the old
> fields.
>
> On Mon, Apr 1, 2019 at 1:33 PM Kenneth Knowles  wrote:
>
>>
>>
>> On Mon, Apr 1, 2019 at 8:59 AM Lukasz Cwik  wrote:
>>
>>> To clarify, docker isn't the only environment type we are using. We have
>>> a process based and "existing" environment mode that don't fit the current
>>> protobuf and is being worked around.
>>>
>>
>> Ah, understood.
>>
>>
>>> The idea would be to move to a URN + payload model like our PTransforms
>>> and coders with a docker specific one. Using the URN + payload would allow
>>> us to have a versioned way to update the environment specifications and
>>> deprecate/remove things that are ill defined.
>>>
>>
>> Makes sense to me. It looks like this migration path is already in place
>> in `message Environment` in beam_runner_api.proto, with `message
>> StandardEnvironments` enumerating some URNs and corresponding payload
>> messages just below. So is the gap just getting the two portable runners to
>> look at the new fields?
>>
>> Kenn
>>
>>
>>> On Fri, Mar 29, 2019 at 6:41 PM Kenneth Knowles  wrote:
>>>


 On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik  wrote:

> The intention is that these kinds of hints such as CPU and/or memory
> should be embedded in the environment specification that is associated 
> with
> the transforms that need resource hints.
>
> The environment spec is woefully ill prepared as it only has a docker
> URL right now.
>

 FWIW I think this is actually "extremely well prepared" :-)

 Protobuf is great for adding fields when you need more but removing is
 nearly impossible once deployed, so it is best to do the absolute minimum
 until you need to expand.

 Kenn


>
> On Thu, Mar 28, 2019 at 8:45 AM Robert Burke 
> wrote:
>
>> A question came over the beam-go slack that I wasn't able to answer,
>> in particular for Dataflow*, is there a way to increase how much of a
>> Portable FnAPI worker is dedicated for the SDK side, vs the Runner side?
>>
>> My assumption is that runners should manage it, and have the Runner
>> Harness side be as lightweight as possible, to operate under reasonable
>> memory bounds, allowing the user-code more room to spread, since it's
>> largely unknown.
>>
>> I saw there's the Provisioning API
>> 
>> which to communicates resource limits to the SDK side, but is there a way
>> to make the request (probably on job start up) in the other direction?
>>
>> I imagine it has to do with the container boot code, but I have only
>> vague knowledge of how that works at present.
>>
>> If there's a portable way for it, that's ideal, but I suspect this
>> will be require a Dataflow specific answer.
>>
>> Thanks!
>> Robert B
>>
>> *Dataflow doesn't support the Go SDK, but the Go SDK supports
>> Dataflow.
>>
>

Re: ParDo Execution Time stat is always 0

2019-04-03 Thread Akshay Balwally

Should have added- I'm using Python sdk, Flink runner

On Wed, Apr 3, 2019 at 10:32 AM Akshay Balwally  wrote:

> Hi,
> I'm hoping to get metrics on the amount of time spent on each operator, so
> it seams like the stat
>
>
> {organization_specific_prefix}.operator.beam-metric-pardo_execution_time-process_bundle_msecs-v1.gauge.mean
>
> would be pretty helpful. But in practice, this stat always shows 0, which
> I interpret as 0 milliseconds spent per bundle, which can't be correct
> (other stats show that the operators are running, and timers within the
> operators show more reasonable times). Is this a known bug?
>
>
> --
> *Akshay Balwally*
> Software Engineer
> 937.271.6469 <+19372716469>
> [image: Lyft] 
>


-- 
*Akshay Balwally*
Software Engineer
937.271.6469 <+19372716469>
[image: Lyft]

ParDo Execution Time stat is always 0

2019-04-03 Thread Akshay Balwally

Hi,
I'm hoping to get metrics on the amount of time spent on each operator, so
it seams like the stat

{organization_specific_prefix}.operator.beam-metric-pardo_execution_time-process_bundle_msecs-v1.gauge.mean

would be pretty helpful. But in practice, this stat always shows 0, which I
interpret as 0 milliseconds spent per bundle, which can't be correct (other
stats show that the operators are running, and timers within the operators
show more reasonable times). Is this a known bug?


-- 
*Akshay Balwally*
Software Engineer
937.271.6469 <+19372716469>
[image: Lyft]

Re: kafka 0.9 support

2019-04-03 Thread Raghu Angadi

On Wed, Apr 3, 2019 at 5:46 AM David Morávek 
wrote:

> I'd say that APIs we use in KafkaIO are pretty much stable since 0.10
> release, all reflection based compatibility adapters seem to be aimed for
> 0.9 release (which is 8 major releases behind current Kafka release).
>
> We may take an inspiration from Flink's kafka connector
> ,
> they maintain separate maven artifact for all supported Kafka APIs. This
> may be the best approach as we can still share most of the codebase between
> versions, have compile time checks and also run tests against all of the
> supported versions.
>

>From that page, Flink also moved to single Kafka connector for versions
10.x and newer. Kafka itself seems to have improved compatibility between
client and broker versions starting 0.11. Not sure if there is any need now
to make multiple versions of KafkaIO versions for 0.9.x etc. Are you
suggesting we should?

>From Flink's page:
"Starting with Flink 1.7, there is a new universal Kafka connector that
does not track a specific Kafka major version. Rather, it tracks the latest
version of Kafka at the time of the Flink release.

If your Kafka broker version is 1.0.0 or newer, you should use this Kafka
connector. If you use an older version of Kafka (0.11, 0.10, 0.9, or 0.8),
you should use the connector corresponding to the broker version."



>
> I'm not really comfortable with reflection based adapters
> 
> as they seem fragile and don't provide compile time checks.
>
> On Tue, Apr 2, 2019 at 11:27 PM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
>
>> I withdraw my concern -- checked on info on the cluster I will eventually
>> access.  It is on 0.8, so I was speaking too soon.  Can't speak to rest of
>> user base.
>>
>> On Tue, Apr 2, 2019 at 11:03 AM Raghu Angadi  wrote:
>>
>>> Thanks to David Morávek for pointing out possible improvement to KafkaIO
>>> for dropping support for 0.9 since it avoids having a second consumer just
>>> to fetch latest offsets for backlog.
>>>
>>> Ideally we should be dropping 0.9 support for next major release, in
>>> fact better to drop versions before 0.10.1 at the same time. This would
>>> further reduce reflection based calls for supporting multiple versions. If
>>> the users still on 0.9 could stay on current stable release of Beam,
>>> dropping would not affect them. Otherwise, it would be good to hear from
>>> them about how long we need to keep support for old versions.
>>>
>>> I don't think it is good idea to have multiple forks of KafkaIO in the
>>> same repo. If we do go that route, we should fork the entire kafka
>>> directory and rename the main class KafkaIO_Unmaintained :).
>>>
>>> IMHO, so far, additional complexity for supporting these versions is not
>>> that bad. Most of it is isolated to ConsumerSpEL.java & ProducerSpEL.java.
>>> My first preference is dropping support for deprecated versions (and a
>>> deprecate a few more versions, may be till the version that added
>>> transactions around 0.11.x I think).
>>>
>>> I haven't looked into what's new in Kafka 2.x. Are there any features
>>> that KafkaIO should take advantage of? I have not noticed our existing code
>>> breaking. We should certainly certainly support latest releases of Kafka.
>>>
>>> Raghu.
>>>
>>> On Tue, Apr 2, 2019 at 10:27 AM Mingmin Xu  wrote:
>>>

 We're still using Kafka 0.10 a lot, similar as 0.9 IMO. To expand
 multiple versions in KafkaIO is quite complex now, and it confuses users
 which is supported / which is not. I would prefer to support Kafka 2.0+
 only in the latest version. For old versions, there're some options:
 1). document Kafka-Beam support versions, like what we do in
 FlinkRunner;
 2). maintain separated KafkaIOs for old versions;

 1) would be easy to maintain, and I assume there should be no issue to
 use Beam-Core 3.0 together with KafkaIO 2.0.

 Any thoughts?

 Mingmin

 On Tue, Apr 2, 2019 at 9:56 AM Reuven Lax  wrote:

> KafkaIO is marked as Experimental, and the comment already warns that
> 0.9 support might be removed. I think that if users still rely on Kafka 
> 0.9
> we should leave a fork (renamed) of the IO in the tree for 0.9, but we can
> definitely remove 0.9 support from the main IO if we want, especially if
> it's complicated changes to that IO. If we do though, we should fail with 
> a
> clear error message telling users to use the Kafka 0.9 IO.
>
> On Tue, Apr 2, 2019 at 9:34 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> > How are multiple versions of Kafka supported? Are they all in one
>> client, or is there a case for forks like ElasticSearchIO?
>>
>> They are supported in one client but we have additiona

Re: Implementation an S3 file system for python SDK

2019-04-03 Thread Pablo Estrada

Hi Pasan!
Thanks for the proposal. I'll try to take a look in the next few hours and
give some feedback.
Best
--P.

On Wed, Apr 3, 2019, 8:53 AM Ahmet Altay  wrote:

> +Pablo Estrada 
>
> On Wed, Apr 3, 2019 at 8:46 AM Lukasz Cwik  wrote:
>
>> +dev 
>>
>> On Wed, Apr 3, 2019 at 2:03 AM Pasan Kamburugamuwa <
>> pasankamburugamu...@gmail.com> wrote:
>>
>>> Hi ,
>>>
>>> I have completed a proposal to implementation an S3 file system for
>>> python SDK for the google summer of Code 2019. Please can you guys review
>>> this proposal and if there is any issues with this proposal, let me know.
>>> Here is the link to the project proposal -
>>>
>>> https://docs.google.com/document/d/1i_PoIrbmhNgwKCS1TYWC28A9RsyZQFsQCJic3aCXO-8/edit?usp=sharing
>>>
>>> Thank you
>>> Pasan Kamburugamuwa
>>>
>>>

Re: [Forked] BEAM-4046 (was [PROPOSAL] Introduce beam-sdks-java gradle project)

2019-04-03 Thread Lukasz Cwik

As a minor point, we do have some cross language dependencies, for example:
* the portability related proto projects are intended to be consumed by Go,
Java and Python
* the docker container gradle projects contain other applications (e.g. go
boot code) that are placed inside the docker container that contain the
language specific SDK harness. There will likely be additional applications
that are separate from the SDK harness like a docker container health
checker that are placed in there as well

On Tue, Apr 2, 2019 at 3:21 PM Michael Luckey  wrote:

> Hi,
>
> agree with Kenn, that this issue at least renders the default
> implementation difficult to use.
>
> Although in the example given, i.e. having  sdks/java/core and
> sdks/py/core, I am unsure, whether it will impose a problem.
>
> As far as I understand until now, the issue triggers on dependency
> declaration. These are - in general - expressed with 3 dimensional maven
> coordinates GroupID, artifactID and version. Of course - semantic of
> version is clear - there are only 2 dimension left to distinguish
> artefacts. As we use a single group id (org.apache.beam) there is only one
> dimension left.
>
> Now this does not impose a problem on plain library dependencies. Of
> course they are uniquely defined. But we are using also lots of project
> dependencies. This project dependencies are translated from project path to
> those maven coordinates. Unfortunately here the project name - which
> happens to be the folder name - is used as artefact id. So if folder names
> match, we might get collisions during dependency resolution.
>
> Clearly, it is not possible to deploy artefacts with those same ids to any
> maven rep expecting sensible results. So we do either not deploy an
> artefact from one of these projects - which would kind of strange as we do
> have a project dependency here - or do rewrite the artefact id of (at
> least) one of the colliding projects. ( we currently do that implicitly
> with the project name we create by flattening our structure)
>
> Now back to the given example, as I do not expect any java project to have
> a project dependency on python, there might be a chance, that this will
> just work.
>
> But of course, this does not really help, as we reasonably might expect
> some /runner/direct/core or sdks/java/io/someio/core which would collide in
> the same way.
>
> As a possible workaround here, we could
> - either require unique folder names
> - or rewrite only colliding project names (as we currently do for all
> projects)
> - or ... (do not know yet)
>
> I suggest I ll invest some time playing around improving that already
> prepared PR to support discussion. So that we have proper grounding to
> decide whether a more hierarchical project structure will be worth that
> hassle.
>
> Looking at the gradle issue - which is already 2 yrs old and iirc was
> reported already at least one year earlier - I do not expect a fix here
> soon.
>
> On Tue, Apr 2, 2019 at 7:19 PM Lukasz Cwik  wrote:
>
>> I didn't know that https://github.com/gradle/gradle/issues/847 existed
>> but the description of the issues people are having are similar to what was
>> discovered during the gradle migration.
>>
>> On Tue, Apr 2, 2019 at 8:02 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi Michael,
>>>
>>> no problem for the thread, that's the goal of the mailing list ;)
>>>
>>> And yes, you got my idea about a "meta" module: easy way of building the
>>> "whole" Java SDK.
>>>
>>> The purpose is not to create a uber jar, but more to simplify the build
>>> for Java SDK developers.
>>>
>>> Do you want me to complete your PR with what I did ?
>>>
>>> Regards
>>> JB
>>>
>>> On 02/04/2019 16:49, Michael Luckey wrote:
>>> > Going to fork the BEAM-4046 discussion. And, JB, I apologise for
>>> > hijacking your thread.
>>> >
>>> > As for the original question, I understood a request for a meta project
>>> > which will enable easier handling of java projects. E.g. instead of
>>> > requiring the user to call
>>> >
>>> > ./gradlew module1:build module2:build ... moduleN.build
>>> >
>>> > a meta project with build task defined something about
>>> >
>>> > build.dependsOn module1:build
>>> > build.dependsOn module2:build
>>> > ...
>>> > build.dependsOn moduleN:build
>>> >
>>> > And other task as found usable.
>>> >
>>> > Not a project which in itself creates some uberjar, which I also
>>> believe
>>> > would rather difficult to implement.
>>> >
>>> > On Tue, Apr 2, 2019 at 5:13 AM Kenneth Knowles >> > > wrote:
>>> >
>>> > Oh, yikes. It seems
>>> > like https://github.com/gradle/gradle/issues/847 indicates that
>>> the
>>> > feature to use the default names in Gradle is practically
>>> > nonfunctional. If that bug is as severe as it looks, I have to
>>> > retract my position. Like we could never have sdks/java/core and
>>> > sdks/py/core, right?
>>> >
>>> > Kenn
>>> >
>>> > On Mon, Apr 1, 2019 at 6:27 PM Michael Luck

Re: Implementation an S3 file system for python SDK

2019-04-03 Thread Ahmet Altay

+Pablo Estrada 

On Wed, Apr 3, 2019 at 8:46 AM Lukasz Cwik  wrote:

> +dev 
>
> On Wed, Apr 3, 2019 at 2:03 AM Pasan Kamburugamuwa <
> pasankamburugamu...@gmail.com> wrote:
>
>> Hi ,
>>
>> I have completed a proposal to implementation an S3 file system for
>> python SDK for the google summer of Code 2019. Please can you guys review
>> this proposal and if there is any issues with this proposal, let me know.
>> Here is the link to the project proposal -
>>
>> https://docs.google.com/document/d/1i_PoIrbmhNgwKCS1TYWC28A9RsyZQFsQCJic3aCXO-8/edit?usp=sharing
>>
>> Thank you
>> Pasan Kamburugamuwa
>>
>>

Re: Implementation an S3 file system for python SDK

2019-04-03 Thread Lukasz Cwik

+dev 

On Wed, Apr 3, 2019 at 2:03 AM Pasan Kamburugamuwa <
pasankamburugamu...@gmail.com> wrote:

> Hi ,
>
> I have completed a proposal to implementation an S3 file system for python
> SDK for the google summer of Code 2019. Please can you guys review this
> proposal and if there is any issues with this proposal, let me know.
> Here is the link to the project proposal -
>
> https://docs.google.com/document/d/1i_PoIrbmhNgwKCS1TYWC28A9RsyZQFsQCJic3aCXO-8/edit?usp=sharing
>
> Thank you
> Pasan Kamburugamuwa
>
>

Re: Quieten javadoc generation

2019-04-03 Thread Maximilian Michels


+1

On 02.04.19 22:56, Mikhail Gryzykhin wrote:
+1 to suppress warnings globally. If we care about an issue, it should 
be error.


On Tue, Apr 2, 2019 at 5:38 AM Alexey Romanenko 
mailto:aromanenko@gmail.com>> wrote:


+1 to suppress such warnings globally. IMO, usually, meaningful
Javadoc description is quite enough to understand what this method does.


On 1 Apr 2019, at 18:21, Kenneth Knowles mailto:k...@apache.org>> wrote:

Personally, I would like to suppress the warnings globally. I
think requiring javadoc everywhere is already enough to remind
someone to write something meaningful. And I think @param rarely
adds anything beyond the function signature and @return rarely
adds anything beyond the description.

Kenn

On Mon, Apr 1, 2019 at 6:53 AM Michael Luckey mailto:adude3...@gmail.com>> wrote:

Hi,

currently our console output gets cluttered by thousands of
Javadoc warnings [1]. Most of them are warnings caused by
missinlng @return or @param tags  [2].

So currently, this signal is completely ignored, and even
worse, makes it difficult to parse through the log.

As I could not find a previous discussion on the list on how
to handle param/return on java docs, I felt the need to ask
here first, how we would like to improve this situation.

Some options
1. fix those warnings
2. do not insist on those tags being present and disable
doclint warnings (probably not doable on tag granularity).
This is already done on doc aggregation task [3]

Thoughts?


[1]
https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/console
[2]
https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/java/
[3]

https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle#L77-L78

Re: kafka 0.9 support

2019-04-03 Thread David Morávek

I'd say that APIs we use in KafkaIO are pretty much stable since 0.10
release, all reflection based compatibility adapters seem to be aimed for
0.9 release (which is 8 major releases behind current Kafka release).

We may take an inspiration from Flink's kafka connector
,
they maintain separate maven artifact for all supported Kafka APIs. This
may be the best approach as we can still share most of the codebase between
versions, have compile time checks and also run tests against all of the
supported versions.

I'm not really comfortable with reflection based adapters

as they seem fragile and don't provide compile time checks.

On Tue, Apr 2, 2019 at 11:27 PM Austin Bennett 
wrote:

> I withdraw my concern -- checked on info on the cluster I will eventually
> access.  It is on 0.8, so I was speaking too soon.  Can't speak to rest of
> user base.
>
> On Tue, Apr 2, 2019 at 11:03 AM Raghu Angadi  wrote:
>
>> Thanks to David Morávek for pointing out possible improvement to KafkaIO
>> for dropping support for 0.9 since it avoids having a second consumer just
>> to fetch latest offsets for backlog.
>>
>> Ideally we should be dropping 0.9 support for next major release, in fact
>> better to drop versions before 0.10.1 at the same time. This would further
>> reduce reflection based calls for supporting multiple versions. If the
>> users still on 0.9 could stay on current stable release of Beam, dropping
>> would not affect them. Otherwise, it would be good to hear from them about
>> how long we need to keep support for old versions.
>>
>> I don't think it is good idea to have multiple forks of KafkaIO in the
>> same repo. If we do go that route, we should fork the entire kafka
>> directory and rename the main class KafkaIO_Unmaintained :).
>>
>> IMHO, so far, additional complexity for supporting these versions is not
>> that bad. Most of it is isolated to ConsumerSpEL.java & ProducerSpEL.java.
>> My first preference is dropping support for deprecated versions (and a
>> deprecate a few more versions, may be till the version that added
>> transactions around 0.11.x I think).
>>
>> I haven't looked into what's new in Kafka 2.x. Are there any features
>> that KafkaIO should take advantage of? I have not noticed our existing code
>> breaking. We should certainly certainly support latest releases of Kafka.
>>
>> Raghu.
>>
>> On Tue, Apr 2, 2019 at 10:27 AM Mingmin Xu  wrote:
>>
>>>
>>> We're still using Kafka 0.10 a lot, similar as 0.9 IMO. To expand
>>> multiple versions in KafkaIO is quite complex now, and it confuses users
>>> which is supported / which is not. I would prefer to support Kafka 2.0+
>>> only in the latest version. For old versions, there're some options:
>>> 1). document Kafka-Beam support versions, like what we do in FlinkRunner;
>>> 2). maintain separated KafkaIOs for old versions;
>>>
>>> 1) would be easy to maintain, and I assume there should be no issue to
>>> use Beam-Core 3.0 together with KafkaIO 2.0.
>>>
>>> Any thoughts?
>>>
>>> Mingmin
>>>
>>> On Tue, Apr 2, 2019 at 9:56 AM Reuven Lax  wrote:
>>>
 KafkaIO is marked as Experimental, and the comment already warns that
 0.9 support might be removed. I think that if users still rely on Kafka 0.9
 we should leave a fork (renamed) of the IO in the tree for 0.9, but we can
 definitely remove 0.9 support from the main IO if we want, especially if
 it's complicated changes to that IO. If we do though, we should fail with a
 clear error message telling users to use the Kafka 0.9 IO.

 On Tue, Apr 2, 2019 at 9:34 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> > How are multiple versions of Kafka supported? Are they all in one
> client, or is there a case for forks like ElasticSearchIO?
>
> They are supported in one client but we have additional “ConsumerSpEL”
> adapter which unifies interface difference among different Kafka client
> versions (mostly to support old ones 0.9-0.10.0).
>
> On the other hand, we warn user in Javadoc of KafkaIO (which is
> Unstable, btw) by the following:
> *“KafkaIO relies on kafka-clients for all its interactions with the
> Kafka cluster.**kafka-clients versions 0.10.1 and newer are supported
> at runtime. The older versions 0.9.x **- 0.10.0.0 are also supported,
> but are deprecated and likely be removed in near future.”*
>
> Despite the fact that, personally, I’d prefer to have only one unified
> client interface but, since people still use Beam with old Kafka 
> instances,
> we, likely, should stick with it till Beam 3.0.
>
> WDYT?
>
> On 2 Apr 2019, at 02:27, Austin Bennett 
> wrote:
>
> FWIW --
>
> On my (desired, not explicitly job-function) roadmap is to

Re: Beam contribution

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

Re: [DISCUSS] Backwards compatibility of @Experimental features

Changes in Beam Jenkins Agents

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

Re: [DISCUSS] Backwards compatibility of @Experimental features

Re: [DISCUSS] Backwards compatibility of @Experimental features

[DISCUSS] change the encoding scheme of Python StrUtf8Coder

Re: [DISCUSS] Backwards compatibility of @Experimental features

Re: kafka 0.9 support

Re: [DISCUSS] Backwards compatibility of @Experimental features

Re: kafka 0.9 support

Re: [VOTE] Release 2.12.0, release candidate #1

Re: [DISCUSS] Backwards compatibility of @Experimental features

Re: [DISCUSS] Backwards compatibility of @Experimental features

[DISCUSS] Backwards compatibility of @Experimental features

Re: kafka 0.9 support

Projects Can Apply Individually for Google Season of Docs

Re: [VOTE] Release 2.12.0, release candidate #1

Re: ParDo Execution Time stat is always 0

[VOTE] Release 2.12.0, release candidate #1

Re: Increase Portable SDK Harness share of memory?

Re: ParDo Execution Time stat is always 0

ParDo Execution Time stat is always 0

Re: kafka 0.9 support

Re: Implementation an S3 file system for python SDK

Re: [Forked] BEAM-4046 (was [PROPOSAL] Introduce beam-sdks-java gradle project)

Re: Implementation an S3 file system for python SDK

Re: Implementation an S3 file system for python SDK

Re: Quieten javadoc generation

Re: kafka 0.9 support

31 matches

Site Navigation

Mail list logo

Footer information