date:20200903

Re: [DISCUSS] Move Avro dependency out of core Beam

2020-09-03 Thread Reuven Lax

As for 2, maybe it's time to remove @Experimental from SchemaCoder?

1 is tricky though. Changes like this have caused a lot of trouble for
users in the past, and I think some users still have unpleasant memories of
being told "you just have to change some package names and imports."

On Thu, Sep 3, 2020 at 6:18 PM Brian Hulette  wrote:

> Hi everyone,
> The fact that core Beam has a dependency on Avro has led to a lot of
> headaches when users (or runners) are using a different version. zeidoo [1]
> was generous enough to put up a WIP PR [2] that moves everything that
> depends on Avro (primarily AvroCoder and the Avro SchemaProvider I believe)
> out of core Beam and into a separate extensions module. This way we could
> have multiple extensions for different versions of Avro in the future.
>
> As I understand it, the downsides to making this change are:
> 1) It's a breaking change, users with AvroCoder in their pipeline will
> need to change their build dependencies and import statements.
> 2) AvroCoder is the only (non-experimental) coder in core Beam that can
> encode complex user types. So new users will need to dabble with the
> Experimental SchemaCoder or add a second dependency to build a pipeline
> with their own types.
>
> I think these costs are outweighed by the benefit of removing the
> dependency in core Beam, but I wanted to reach out to the community to see
> if there are any objections.
>
> Brian
>
> [1] github.com/zeidoo
> [2] https://github.com/apache/beam/pull/12748
>

Re: Clear Timer in Java SDK

2020-09-03 Thread Boyuan Zhang

Thanks, Luke!

Do we want to support the API in Java SDK? And what's the common approach
for now in Java SDK to "kind of" clear a timer, like setting the
fireTimestamp to infinity future?

On Thu, Sep 3, 2020 at 3:43 PM Luke Cwik  wrote:

> Java SDK hasn't exposed the ability to remove timers.
>
> On Wed, Sep 2, 2020 at 11:00 AM Boyuan Zhang  wrote:
>
>> Hi team,
>>
>> I'm looking for something similar to timer.clear() from Python SDK[1] in
>> Java SDK but it seems like we haven't exposed clearing timer API from Java
>> Timer. Does Java SDK have another way to clear a timer or we just haven't
>> worked on this API?
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/bundle_processor.py#L660-L671
>>
>

[DISCUSS] Move Avro dependency out of core Beam

2020-09-03 Thread Brian Hulette

Hi everyone,
The fact that core Beam has a dependency on Avro has led to a lot of
headaches when users (or runners) are using a different version. zeidoo [1]
was generous enough to put up a WIP PR [2] that moves everything that
depends on Avro (primarily AvroCoder and the Avro SchemaProvider I believe)
out of core Beam and into a separate extensions module. This way we could
have multiple extensions for different versions of Avro in the future.

As I understand it, the downsides to making this change are:
1) It's a breaking change, users with AvroCoder in their pipeline will need
to change their build dependencies and import statements.
2) AvroCoder is the only (non-experimental) coder in core Beam that can
encode complex user types. So new users will need to dabble with the
Experimental SchemaCoder or add a second dependency to build a pipeline
with their own types.

I think these costs are outweighed by the benefit of removing the
dependency in core Beam, but I wanted to reach out to the community to see
if there are any objections.

Brian

[1] github.com/zeidoo
[2] https://github.com/apache/beam/pull/12748

Re: [VOTE] Release 2.24.0, release candidate #2

2020-09-03 Thread Daniel Oliveira

Alright, it seems like this is a regression so it'll definitely need to be
fixed. I'm going to be out of office starting tomorrow until next Tuesday,
so I'll try to get a new release candidate out tonight so it's available
while I'm out.

Anyway, this vote is closed.

As usual, I recommend to keep testing RC2 until RC3 is out.

On Thu, Sep 3, 2020 at 12:51 PM Pablo Estrada  wrote:

> -1
> I've confirmed the issue reproduces in 2.24. The fix includes a test case.
> (https://github.com/apache/beam/pull/12761). I will send a cherry-pick
> for this.
> Best
> -P.
>
> On Thu, Sep 3, 2020 at 12:14 PM Pablo Estrada  wrote:
>
>> I just discovered there may be an issue in the BigQuery connector for
>> Python, where very large imports to BQ may not be working properly due to
>> copy job ids being duplicated (PR to fix on master is here:
>> https://github.com/apache/beam/pull/12761)
>> I will try to reproduce the error on this RC, and if I can repro it, then
>> I'll vote -1, as the WriteToBigQuery transform is used by many users.
>>
>> On Thu, Sep 3, 2020 at 2:12 AM Ismaël Mejía  wrote:
>>
>>> I just want to confirm that the issue I reported in RC1 is now fixed.
>>> Thanks Daniel!
>>>
>>> On Thu, Sep 3, 2020 at 6:44 AM Daniel Oliveira 
>>> wrote:
>>> >
>>> > This RC was built with the expected version of OpenJDK, so it should
>>> fix the issue from the previous RC.
>>> >
>>> > Unfortunately Dataflow containers are not yet ready, so there will be
>>> some delay before Dataflow can be tested. I'm working on getting that done
>>> ASAP and will update this thread once they're ready.
>>> >
>>> > On Wed, Sep 2, 2020 at 9:40 PM Daniel Oliveira 
>>> wrote:
>>> >>
>>> >> Hi everyone,
>>> >> Please review and vote on the release candidate #2 for the version
>>> 2.24.0, as follows:
>>> >> [ ] +1, Approve the release
>>> >> [ ] -1, Do not approve the release (please provide specific comments)
>>> >>
>>> >>
>>> >> The complete staging area is available for your review, which
>>> includes:
>>> >> * JIRA release notes [1],
>>> >> * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with fingerprint
>>> D0E7B69D911ADA3C0482BAA1C4E6B2F8C71D742F [3],
>>> >> * all artifacts to be deployed to the Maven Central Repository [4],
>>> >> * source code tag "v2.24.0-RC2" [5],
>>> >> * website pull request listing the release [6], publishing the API
>>> reference manual [7], and the blog post [8].
>>> >> * Java artifacts were built with Maven 3.6.3 and OpenJDK 1.8.0.
>>> >> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>> >> * Validation sheet with a tab for 2.24.0 release to help with
>>> validation [9].
>>> >> * Docker images published to Docker Hub [10].
>>> >>
>>> >> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>> >>
>>> >> Thanks,
>>> >> Release Manager
>>> >>
>>> >> [1]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12347146
>>> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.24.0/
>>> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> >> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1109/
>>> >> [5] https://github.com/apache/beam/tree/v2.24.0-RC2
>>> >> [6] https://github.com/apache/beam/pull/12743
>>> >> [7] https://github.com/apache/beam-site/pull/607
>>> >> [8] https://github.com/apache/beam/pull/12745
>>> >> [9]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331
>>> >> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> >>
>>>
>>

Re: Clear Timer in Java SDK

2020-09-03 Thread Luke Cwik

Java SDK hasn't exposed the ability to remove timers.

On Wed, Sep 2, 2020 at 11:00 AM Boyuan Zhang  wrote:

> Hi team,
>
> I'm looking for something similar to timer.clear() from Python SDK[1] in
> Java SDK but it seems like we haven't exposed clearing timer API from Java
> Timer. Does Java SDK have another way to clear a timer or we just haven't
> worked on this API?
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/bundle_processor.py#L660-L671
>

Re: Re-running GitHub Actions jobs

2020-09-03 Thread Brian Hulette

There's a "Re-run Jobs" button at the top right when you open up one of the
jobs:

[image: image.png]

On Thu, Sep 3, 2020 at 12:02 PM Heejong Lee  wrote:

>
>
> On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette  wrote:
>
>> The new GitHub Actions workflows that run Java and Python tests against
>> different targets (macos, ubuntu, windows) are great! But just like our
>> Jenkins infra they flake occasionally. Should we be re-running all of these
>> jobs until we get green runs?
>>
>> Unfortunately it's not possible to re-run an individual job in a workflow
>> [1], the only option is to re-run all jobs, so flaky tests become even more
>> problematic.
>>
>> I see two options:
>> 1) Consider it "good enough" if just Jenkins CI passes and any GitHub
>> actions failures appear to be flakes.
>> 2) Require that all Jenkins and GitHub checks pass.
>>
>> My vote is for (2). (1) risks merging legitimate breakages, and one could
>> argue that making flaky tests extra painful is a good thing. Also we can
>> always make an exception if an obvious flake is blocking a critical PR.
>>
>
> +1 for (2) given that it might be not so easy to figure out whether the
> failure is flaky (or how critical it is).
> BTW, I see it's impossible to re-run a specific test but how do we re-run
> all tests then? Is there a menu item for it or needs to force update the
> commits?
>
>
>>
>>
>> Also FYI - at first I thought these workflows only had the stdout
>> available, but the test report directory is also zipped and uploaded as an
>> artifact. When a failure occurs you can download it to get the full output:
>> [image: image.png]
>>
>>
>> Brian
>>
>> [1]
>> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>>
>

Re: Re-running GitHub Actions jobs

2020-09-03 Thread Heejong Lee

On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette  wrote:

> The new GitHub Actions workflows that run Java and Python tests against
> different targets (macos, ubuntu, windows) are great! But just like our
> Jenkins infra they flake occasionally. Should we be re-running all of these
> jobs until we get green runs?
>
> Unfortunately it's not possible to re-run an individual job in a workflow
> [1], the only option is to re-run all jobs, so flaky tests become even more
> problematic.
>
> I see two options:
> 1) Consider it "good enough" if just Jenkins CI passes and any GitHub
> actions failures appear to be flakes.
> 2) Require that all Jenkins and GitHub checks pass.
>
> My vote is for (2). (1) risks merging legitimate breakages, and one could
> argue that making flaky tests extra painful is a good thing. Also we can
> always make an exception if an obvious flake is blocking a critical PR.
>

+1 for (2) given that it might be not so easy to figure out whether the
failure is flaky (or how critical it is).
BTW, I see it's impossible to re-run a specific test but how do we re-run
all tests then? Is there a menu item for it or needs to force update the
commits?


>
>
> Also FYI - at first I thought these workflows only had the stdout
> available, but the test report directory is also zipped and uploaded as an
> artifact. When a failure occurs you can download it to get the full output:
> [image: image.png]
>
>
> Brian
>
> [1]
> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>

Re-running GitHub Actions jobs

2020-09-03 Thread Brian Hulette

The new GitHub Actions workflows that run Java and Python tests against
different targets (macos, ubuntu, windows) are great! But just like our
Jenkins infra they flake occasionally. Should we be re-running all of these
jobs until we get green runs?

Unfortunately it's not possible to re-run an individual job in a workflow
[1], the only option is to re-run all jobs, so flaky tests become even more
problematic.

I see two options:
1) Consider it "good enough" if just Jenkins CI passes and any GitHub
actions failures appear to be flakes.
2) Require that all Jenkins and GitHub checks pass.

My vote is for (2). (1) risks merging legitimate breakages, and one could
argue that making flaky tests extra painful is a good thing. Also we can
always make an exception if an obvious flake is blocking a critical PR.


Also FYI - at first I thought these workflows only had the stdout
available, but the test report directory is also zipped and uploaded as an
artifact. When a failure occurs you can download it to get the full output:
[image: image.png]


Brian

[1]
https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234

Re: KinesisIO - aggregation

2020-09-03 Thread Jonothan Farr

Hi Mani. The connector that’s in io/kinesis is based on version 1 of the AWS 
SDK for Java and the one in amazon-web-services2 is based on the version 2 SDK. 
My understanding is that there were some licensing issues preventing it from 
being bundled together with the other AWS IOs initially but those have since 
been resolved. For now the two connectors are functionally equivalent but the 
v1 is deprecated so you should use v2 instead.

> On Sep 3, 2020, at 1:45 AM, Sunny, Mani Kolbe  wrote:
> 
> 
> Hi Jonothan,
>  
> That is good news! I was of the impression, it is not supported or have to 
> enable some flag. By the way, what is the difference between KinesisIO and 
> aws V2. Is aws V2 is a set of classes to support AWS related connectors and 
> KinesisIO is referring back to them?
>  
> I mean for read/write from Kinesis, KinesisIO is the still the way to go?
>  
> Regards,
> Mani
>  
> From: Jonothan Farr  
> Sent: Thursday, September 3, 2020 1:02 AM
> To: dev@beam.apache.org
> Subject: Re: KinesisIO - aggregation
>  
> CAUTION: This email originated from outside of D Please do not click links 
> or open attachments unless you recognize the sender and know the content is 
> safe.
>  
> KinesisIO works fine for me with aggegated records. Here's where 
> deaggregate() is called in v1:
> 
> https://github.com/apache/beam/blob/9f4d54a2e60ec45150437c0b050f4c73cca91f36/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java#L170
> 
> And v2:
> 
> https://github.com/apache/beam/blob/9f4d54a2e60ec45150437c0b050f4c73cca91f36/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/kinesis/SimplifiedKinesisClient.java#L181
> 
> Were you having problems getting it to work?
>  
> On Wed, Sep 2, 2020 at 8:22 AM Alexey Romanenko  
> wrote:
> Yes, it’s not supported for now, but, at the first sight, it seems that we 
> just need to call UserRecord.deaggregate() in GetKinesisRecordsResult in case 
> if record is aggregated. 
>  
> 
> On 2 Sep 2020, at 14:30, Sunny, Mani Kolbe  wrote:
>  
> Hi Alexey,
>  
> I am looking for reading Kinesis stream with that has aggregated record. From 
> your reply, I take that it is currently not supported? Could that be solved 
> by adding an uncompression function on the pipeline ?
>  
> Regards,
> Mani
>  
> From: Alexey Romanenko  
> Sent: Tuesday, September 1, 2020 6:04 PM
> To: dev@beam.apache.org
> Subject: Re: KinesisIO - aggregation
>  
> CAUTION: This email originated from outside of D Please do not click links 
> or open attachments unless you recognize the sender and know the content is 
> safe.
>  
> Hello Mani, 
>  
> For Write part it should be already supported since KinesisIO uses KPL to 
> write records under the hood. So, it’s just a question of proper 
> configuration [1][2][3]
> For Read part, since it’s based on AWS API, it’s more complicated and we need 
> to add a support for this explicitly.
>  
> [1] 
> https://github.com/apache/beam/blob/afa59c463ca9c5d42a9ab13402a72c8b95240fa5/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java#L657
> [2] https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-config.html
> [3] 
> https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties
>  
> 
> On 31 Aug 2020, at 16:12, Sunny, Mani Kolbe  wrote:
>  
> Hello,
>  
> Does Beam have plans to support Kinesis records with aggregation[1]? I see 
> some code [2] in the repo related to that. Is it planned for any near future 
> releases?
>  
> Regards,
> Mani
>  
>  
> [1] 
> https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html#kinesis-kpl-concepts-aggretation
> [2] 
> https://github.com/apache/beam/blob/22822b8c611f4dd3b2039e4f7e7d98fcab39/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/kinesis/SimplifiedKinesisClient.java#L181
>

RE: KinesisIO - aggregation

2020-09-03 Thread Sunny, Mani Kolbe

No worries Alexey.. It is hard to keep track of everything as more and more IO 
connectors are added to Beam. Time to have subject matter experts for each IO 

From: Alexey Romanenko 
Sent: Thursday, September 3, 2020 12:46 PM
To: dev@beam.apache.org
Subject: Re: KinesisIO - aggregation

CAUTION: This email originated from outside of D Please do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.

Oops, my bad, I missed that it’s already supported. Thanks for clarification!

On 3 Sep 2020, at 02:02, Jonothan Farr 
mailto:jonothan.f...@gmail.com>> wrote:

KinesisIO works fine for me with aggegated records. Here's where deaggregate() 
is called in v1:

https://github.com/apache/beam/blob/9f4d54a2e60ec45150437c0b050f4c73cca91f36/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java#L170

And v2:

https://github.com/apache/beam/blob/9f4d54a2e60ec45150437c0b050f4c73cca91f36/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/kinesis/SimplifiedKinesisClient.java#L181

Were you having problems getting it to work?

On Wed, Sep 2, 2020 at 8:22 AM Alexey Romanenko 
mailto:aromanenko@gmail.com>> wrote:
Yes, it’s not supported for now, but, at the first sight, it seems that we just 
need to call UserRecord.deaggregate() in GetKinesisRecordsResult in case if 
record is aggregated.

On 2 Sep 2020, at 14:30, Sunny, Mani Kolbe 
mailto:sun...@dnb.com>> wrote:

Hi Alexey,

I am looking for reading Kinesis stream with that has aggregated record. From 
your reply, I take that it is currently not supported? Could that be solved by 
adding an uncompression function on the pipeline ?

Regards,
Mani

From: Alexey Romanenko 
mailto:aromanenko@gmail.com>>
Sent: Tuesday, September 1, 2020 6:04 PM
To: dev@beam.apache.org
Subject: Re: KinesisIO - aggregation

CAUTION: This email originated from outside of D Please do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.

Hello Mani,

For Write part it should be already supported since KinesisIO uses KPL to write 
records under the hood. So, it’s just a question of proper configuration 
[1][2][3]
For Read part, since it’s based on AWS API, it’s more complicated and we need 
to add a support for this explicitly.

[1] 
https://github.com/apache/beam/blob/afa59c463ca9c5d42a9ab13402a72c8b95240fa5/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java#L657
[2] 
https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-config.html
[3] 
https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties

On 31 Aug 2020, at 16:12, Sunny, Mani Kolbe 
mailto:sun...@dnb.com>> wrote:

Hello,

Does Beam have plans to support Kinesis records with aggregation[1]? I see some 
code [2] in the repo related to that.

Re: KinesisIO - aggregation

2020-09-03 Thread Alexey Romanenko

Oops, my bad, I missed that it’s already supported. Thanks for clarification!

> On 3 Sep 2020, at 02:02, Jonothan Farr  wrote:
> 
> KinesisIO works fine for me with aggegated records. Here's where 
> deaggregate() is called in v1:
> 
> https://github.com/apache/beam/blob/9f4d54a2e60ec45150437c0b050f4c73cca91f36/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java#L170
>  
> 
> 
> And v2:
> 
> https://github.com/apache/beam/blob/9f4d54a2e60ec45150437c0b050f4c73cca91f36/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/kinesis/SimplifiedKinesisClient.java#L181
>  
> 
> 
> Were you having problems getting it to work?
> 
> On Wed, Sep 2, 2020 at 8:22 AM Alexey Romanenko  > wrote:
> Yes, it’s not supported for now, but, at the first sight, it seems that we 
> just need to call UserRecord.deaggregate() in GetKinesisRecordsResult in case 
> if record is aggregated. 
>  
>> On 2 Sep 2020, at 14:30, Sunny, Mani Kolbe > > wrote:
>> 
>> Hi Alexey,
>>  
>> I am looking for reading Kinesis stream with that has aggregated record. 
>> From your reply, I take that it is currently not supported? Could that be 
>> solved by adding an uncompression function on the pipeline ?
>>  
>> Regards,
>> Mani
>>  
>> From: Alexey Romanenko > > 
>> Sent: Tuesday, September 1, 2020 6:04 PM
>> To: dev@beam.apache.org 
>> Subject: Re: KinesisIO - aggregation
>>  
>> CAUTION: This email originated from outside of D Please do not click 
>> links or open attachments unless you recognize the sender and know the 
>> content is safe.
>>  
>> Hello Mani, 
>>  
>> For Write part it should be already supported since KinesisIO uses KPL to 
>> write records under the hood. So, it’s just a question of proper 
>> configuration [1][2][3]
>> For Read part, since it’s based on AWS API, it’s more complicated and we 
>> need to add a support for this explicitly.
>>  
>> [1] 
>> https://github.com/apache/beam/blob/afa59c463ca9c5d42a9ab13402a72c8b95240fa5/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java#L657
>>  
>> 
>> [2] https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-config.html 
>> 
>> [3] 
>> https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties
>>  
>> 
>> 
>> 
>> On 31 Aug 2020, at 16:12, Sunny, Mani Kolbe > > wrote:
>>  
>> Hello,
>>  
>> Does Beam have plans to support Kinesis records with aggregation[1]? I see 
>> some code [2] in the repo related to that. Is it planned for any near future 
>> releases?
>>  
>> Regards,
>> Mani
>>  
>>  
>> [1] 
>> https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html#kinesis-kpl-concepts-aggretation
>>  
>> 
>> [2] 
>> https://github.com/apache/beam/blob/22822b8c611f4dd3b2039e4f7e7d98fcab39/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/kinesis/SimplifiedKinesisClient.java#L181
>>  
>>

Re: [VOTE] Release 2.24.0, release candidate #2

2020-09-03 Thread Ismaël Mejía

I just want to confirm that the issue I reported in RC1 is now fixed.
Thanks Daniel!

On Thu, Sep 3, 2020 at 6:44 AM Daniel Oliveira  wrote:
>
> This RC was built with the expected version of OpenJDK, so it should fix the 
> issue from the previous RC.
>
> Unfortunately Dataflow containers are not yet ready, so there will be some 
> delay before Dataflow can be tested. I'm working on getting that done ASAP 
> and will update this thread once they're ready.
>
> On Wed, Sep 2, 2020 at 9:40 PM Daniel Oliveira  wrote:
>>
>> Hi everyone,
>> Please review and vote on the release candidate #2 for the version 2.24.0, 
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org [2], 
>> which is signed with the key with fingerprint 
>> D0E7B69D911ADA3C0482BAA1C4E6B2F8C71D742F [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.24.0-RC2" [5],
>> * website pull request listing the release [6], publishing the API reference 
>> manual [7], and the blog post [8].
>> * Java artifacts were built with Maven 3.6.3 and OpenJDK 1.8.0.
>> * Python artifacts are deployed along with the source release to the 
>> dist.apache.org [2].
>> * Validation sheet with a tab for 2.24.0 release to help with validation [9].
>> * Docker images published to Docker Hub [10].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority 
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Release Manager
>>
>> [1] 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12347146
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.24.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4] https://repository.apache.org/content/repositories/orgapachebeam-1109/
>> [5] https://github.com/apache/beam/tree/v2.24.0-RC2
>> [6] https://github.com/apache/beam/pull/12743
>> [7] https://github.com/apache/beam-site/pull/607
>> [8] https://github.com/apache/beam/pull/12745
>> [9] 
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331
>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>>

RE: KinesisIO - aggregation

2020-09-03 Thread Sunny, Mani Kolbe

Hi Jonothan,

That is good news! I was of the impression, it is not supported or have to 
enable some flag. By the way, what is the difference between KinesisIO and aws 
V2. Is aws V2 is a set of classes to support AWS related connectors and 
KinesisIO is referring back to them?

I mean for read/write from Kinesis, KinesisIO is the still the way to go?

Regards,
Mani

From: Jonothan Farr 
Sent: Thursday, September 3, 2020 1:02 AM
To: dev@beam.apache.org
Subject: Re: KinesisIO - aggregation

CAUTION: This email originated from outside of D Please do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.

KinesisIO works fine for me with aggegated records. Here's where deaggregate() 
is called in v1:

https://github.com/apache/beam/blob/9f4d54a2e60ec45150437c0b050f4c73cca91f36/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java#L170

And v2:

https://github.com/apache/beam/blob/9f4d54a2e60ec45150437c0b050f4c73cca91f36/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/kinesis/SimplifiedKinesisClient.java#L181

Were you having problems getting it to work?

On Wed, Sep 2, 2020 at 8:22 AM Alexey Romanenko 
mailto:aromanenko@gmail.com>> wrote:
Yes, it’s not supported for now, but, at the first sight, it seems that we just 
need to call UserRecord.deaggregate() in GetKinesisRecordsResult in case if 
record is aggregated.


On 2 Sep 2020, at 14:30, Sunny, Mani Kolbe 
mailto:sun...@dnb.com>> wrote:

Hi Alexey,

I am looking for reading Kinesis stream with that has aggregated record. From 
your reply, I take that it is currently not supported? Could that be solved by 
adding an uncompression function on the pipeline ?

Regards,
Mani

From: Alexey Romanenko 
mailto:aromanenko@gmail.com>>
Sent: Tuesday, September 1, 2020 6:04 PM
To: dev@beam.apache.org
Subject: Re: KinesisIO - aggregation

CAUTION: This email originated from outside of D Please do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.

Hello Mani,

For Write part it should be already supported since KinesisIO uses KPL to write 
records under the hood. So, it’s just a question of proper configuration 
[1][2][3]
For Read part, since it’s based on AWS API, it’s more complicated and we need 
to add a support for this explicitly.

[1] 
https://github.com/apache/beam/blob/afa59c463ca9c5d42a9ab13402a72c8b95240fa5/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java#L657
[2] 
https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-config.html
[3] 
https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties

On 31 Aug 2020, at 16:12, Sunny, Mani Kolbe 
mailto:sun...@dnb.com>> wrote:

Hello,

Does Beam have plans to support Kinesis records with aggregation[1]? I see

Re: Could someone review my pull request 12695 ?

2020-09-03 Thread Niel Markwick

I have a bug/PR open on the spanner client libraries, but replacing
ImmutableList with List would be a breaking API change for them compared to
an already-released version, and so will be unlikely to be fixed soon

On Wed, 2 Sep 2020, 01:09 terry xian,  wrote:

>
> As nielm pointed out in his comment of my PR, I added this because
>
> "spanner API change that exposes Guava classes is:
> googleapis/java-spanner/pull/81
> ,
>
> Specifically, adding AsyncResultSet in
> googleapis/java-spanner/pull/81/files#diff-7a9cb34faeb259be46b44f1878b7210f
>
> 
> which returns an ImmutableList."
>
>
> Without this addition, the ApiSurface test would fail,  please see: 
> beam_PreCommit_Java_Commit
> #13264 test - testGcpApiSurface [Jenkins]
> .
> So I was suggested to add new exposed class explicitly.
>
> beam_PreCommit_Java_Commit #13264 test - testGcpApiSurface [Jenkins]
>
>
> 
>
>
> Thanks!
>
>
>
>
> On Tuesday, September 1, 2020, 03:48:37 PM PDT, Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>
> BTW this PR adds the following to the API surface.
>
> (com.google.common.collect.ImmutableCollection.class),
> (com.google.common.collect.ImmutableCollection.Builder.class),
> (com.google.common.collect.ImmutableList.class),
> (com.google.common.collect.ImmutableList.Builder.class),
> (com.google.common.collect.UnmodifiableIterator.class),
> (com.google.common.collect.UnmodifiableListIterator.class),
>
> Any objections to this ?
>
> Terry, could you explain the reason for adding this.
>
> Thanks,
> Cham
>
> On Tue, Sep 1, 2020 at 2:40 PM Chamikara Jayalath 
> wrote:
>
> LGTM. We can merge when tests pass.
>
> Thanks,
> Cham
>
> On Tue, Sep 1, 2020 at 1:32 PM terry xian  wrote:
>
> Hi,
>
> My pull request [BEAM-8758] Google-cloud-spanner upgrade to 1.59.0 and
> google_cloud_bigtable_client_core to 1.16.0 by terryxian78 · Pull Request
> #12695 · apache/beam   was
> there for more than 3 days. Although I've added a reviewer (lukecwik
> ), I am afraid that I missed something which
> might cause the PR not noticed (it is my first PR in Beam). I've asked some
> folks which work on spanner change review my change but need committee
> member for approval.
>
> Could someone in committee review my PR?
>
> Thanks!
>
>
> [BEAM-8758] Google-cloud-spanner upgrade to 1.59.0 and google_cloud_bigt...
>
> Fixes https://issues.apache.org/jira/browse/BEAM-8758 R: @lukecwik CC:
> @suztomo The changes are: The main purpo...
> 
>
>
>
>
>

Re: [DISCUSS] Move Avro dependency out of core Beam

Re: Clear Timer in Java SDK

[DISCUSS] Move Avro dependency out of core Beam

Re: [VOTE] Release 2.24.0, release candidate #2

Re: Clear Timer in Java SDK

Re: Re-running GitHub Actions jobs

Re: Re-running GitHub Actions jobs

Re-running GitHub Actions jobs

Re: KinesisIO - aggregation

RE: KinesisIO - aggregation

Re: KinesisIO - aggregation

Re: [VOTE] Release 2.24.0, release candidate #2

RE: KinesisIO - aggregation

Re: Could someone review my pull request 12695 ?

14 matches

Site Navigation

Mail list logo

Footer information