Flaky test issue report

2021-04-26 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests. These are P1 issues 
because they have a major negative impact on the community and make it hard to 
determine the quality of the software.

BEAM-12200: SamzaStoreStateInternalsTest is flaky 
(https://issues.apache.org/jira/browse/BEAM-12200)
BEAM-12163: Python GHA PreCommits flake with grpc.FutureTimeoutError on SDK 
harness startup (https://issues.apache.org/jira/browse/BEAM-12163)
BEAM-12061: beam_PostCommit_SQL failing on 
KafkaTableProviderIT.testFakeNested 
(https://issues.apache.org/jira/browse/BEAM-12061)
BEAM-12020: :sdks:java:container:java8:docker failing missing licenses 
(https://issues.apache.org/jira/browse/BEAM-12020)
BEAM-12019: 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky (https://issues.apache.org/jira/browse/BEAM-12019)
BEAM-11792: Python precommit failed (flaked?) installing package  
(https://issues.apache.org/jira/browse/BEAM-11792)
BEAM-11733: [beam_PostCommit_Java] [testFhirIO_Import|export] flaky 
(https://issues.apache.org/jira/browse/BEAM-11733)
BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (https://issues.apache.org/jira/browse/BEAM-11666)
BEAM-11662: elasticsearch tests failing 
(https://issues.apache.org/jira/browse/BEAM-11662)
BEAM-11661: hdfsIntegrationTest flake: network not found (py38 postcommit) 
(https://issues.apache.org/jira/browse/BEAM-11661)
BEAM-11646: beam_PostCommit_XVR_Spark failing 
(https://issues.apache.org/jira/browse/BEAM-11646)
BEAM-11645: beam_PostCommit_XVR_Flink failing 
(https://issues.apache.org/jira/browse/BEAM-11645)
BEAM-11541: testTeardownCalledAfterExceptionInProcessElement flakes on 
direct runner. (https://issues.apache.org/jira/browse/BEAM-11541)
BEAM-11540: Linter sometimes flakes on apache_beam.dataframe.frames_test 
(https://issues.apache.org/jira/browse/BEAM-11540)
BEAM-10995: Java + Universal Local Runner: 
WindowingTest.testWindowPreservation fails 
(https://issues.apache.org/jira/browse/BEAM-10995)
BEAM-10987: stager_test.py::StagerTest::test_with_main_session flaky on 
windows py3.6,3.7 (https://issues.apache.org/jira/browse/BEAM-10987)
BEAM-10968: flaky test: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-10968)
BEAM-10955: Flink Java Runner test flake: Could not find Flink job  
(https://issues.apache.org/jira/browse/BEAM-10955)
BEAM-10923: Python requirements installation in docker container is flaky 
(https://issues.apache.org/jira/browse/BEAM-10923)
BEAM-10899: test_FhirIO_exportFhirResourcesGcs flake with OOM 
(https://issues.apache.org/jira/browse/BEAM-10899)
BEAM-10866: PortableRunnerTestWithSubprocesses.test_register_finalizations 
flaky on macOS (https://issues.apache.org/jira/browse/BEAM-10866)
BEAM-10763: Spotless flake (NullPointerException) 
(https://issues.apache.org/jira/browse/BEAM-10763)
BEAM-10590: BigQueryQueryToTableIT flaky: test_big_query_new_types 
(https://issues.apache.org/jira/browse/BEAM-10590)
BEAM-10519: 
MultipleInputsAndOutputTests.testParDoWithSideInputsIsCumulative flaky on Samza 
(https://issues.apache.org/jira/browse/BEAM-10519)
BEAM-10504: Failure / flake in ElasticSearchIOTest > 
testWriteFullAddressing and testWriteWithIndexFn 
(https://issues.apache.org/jira/browse/BEAM-10504)
BEAM-10501: CheckGrafanaStalenessAlerts and PingGrafanaHttpApi fail with 
Connection refused (https://issues.apache.org/jira/browse/BEAM-10501)
BEAM-10485: Failure / flake: ElasticsearchIOTest > testWriteWithIndexFn 
(https://issues.apache.org/jira/browse/BEAM-10485)
BEAM-10272: Failure in CassandraIOTest init: cannot create cluster due to 
netty link error (https://issues.apache.org/jira/browse/BEAM-10272)
BEAM-9649: beam_python_mongoio_load_test started failing due to mismatched 
results (https://issues.apache.org/jira/browse/BEAM-9649)
BEAM-9392: TestStream tests are all flaky 
(https://issues.apache.org/jira/browse/BEAM-9392)
BEAM-9232: BigQueryWriteIntegrationTests is flaky coercing to Unicode 
(https://issues.apache.org/jira/browse/BEAM-9232)
BEAM-9119: 
apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest[...].test_large_elements
 is flaky (https://issues.apache.org/jira/browse/BEAM-9119)
BEAM-8879: IOError flake in PortableRunnerTest 
(https://issues.apache.org/jira/browse/BEAM-8879)
BEAM-8101: Flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful for 
Direct, Spark, Flink (https://issues.apache.org/jira/browse/BEAM-8101)
BEAM-8035: [beam_PreCommit_Java_Phrase] 
[WatchTest.testMultiplePollsWithManyResults]  Flake: Outputs must be in 
timestamp order (https://issues.apache.org/jira/browse/BEAM-8035)
BEAM-7992: Unhandled type_constraint in 

P1 issues report

2021-04-26 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests.

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

BEAM-12229: WindmillStateCache has a 0% hit rate in 2.29 
(https://issues.apache.org/jira/browse/BEAM-12229)
BEAM-12227: LOOPBACK does not work for portable java pipelines 
(https://issues.apache.org/jira/browse/BEAM-12227)
BEAM-1: Dataflow side input translation "Unknown producer for value" 
(https://issues.apache.org/jira/browse/BEAM-1)
BEAM-12205: Dataflow pipelines broken NoSuchMethodError 
DoFnInvoker.invokeSetup() (https://issues.apache.org/jira/browse/BEAM-12205)
BEAM-12195: Flink Runner 1.11 uses old Scala-Version 
(https://issues.apache.org/jira/browse/BEAM-12195)
BEAM-11959: Python Beam SDK Harness hangs when installing pip packages 
(https://issues.apache.org/jira/browse/BEAM-11959)
BEAM-11906: No trigger early repeatedly for session windows 
(https://issues.apache.org/jira/browse/BEAM-11906)
BEAM-11875: XmlIO.Read does not handle XML encoding per spec 
(https://issues.apache.org/jira/browse/BEAM-11875)
BEAM-11828: JmsIO is not acknowledging messages correctly 
(https://issues.apache.org/jira/browse/BEAM-11828)
BEAM-11755: Cross-language consistency (RequiresStableInputs) is quietly 
broken (at least on portable flink runner) 
(https://issues.apache.org/jira/browse/BEAM-11755)
BEAM-11578: `dataflow_metrics` (python) fails with TypeError (when int 
overflowing?) (https://issues.apache.org/jira/browse/BEAM-11578)
BEAM-11576: Go ValidatesRunner failure: TestFlattenDup on Dataflow Runner 
(https://issues.apache.org/jira/browse/BEAM-11576)
BEAM-11434: Expose Spanner admin/batch clients in Spanner Accessor 
(https://issues.apache.org/jira/browse/BEAM-11434)
BEAM-11227: Upgrade beam-vendor-grpc-1_26_0-0.3 to fix CVE-2020-27216 
(https://issues.apache.org/jira/browse/BEAM-11227)
BEAM-11148: Kafka commitOffsetsInFinalize OOM on Flink 
(https://issues.apache.org/jira/browse/BEAM-11148)
BEAM-11017: Timer with dataflow runner can be set multiple times (dataflow 
runner) (https://issues.apache.org/jira/browse/BEAM-11017)
BEAM-10861: Adds URNs and payloads to PubSub transforms 
(https://issues.apache.org/jira/browse/BEAM-10861)
BEAM-10617: python CombineGlobally().with_fanout() cause duplicate combine 
results for sliding windows (https://issues.apache.org/jira/browse/BEAM-10617)
BEAM-10569: SpannerIO tests don't actually assert anything. 
(https://issues.apache.org/jira/browse/BEAM-10569)
BEAM-10288: Quickstart documents are out of date 
(https://issues.apache.org/jira/browse/BEAM-10288)
BEAM-10244: Populate requirements cache fails on poetry-based packages 
(https://issues.apache.org/jira/browse/BEAM-10244)
BEAM-10100: FileIO writeDynamic with AvroIO.sink not writing all data 
(https://issues.apache.org/jira/browse/BEAM-10100)
BEAM-9564: Remove insecure ssl options from MongoDBIO 
(https://issues.apache.org/jira/browse/BEAM-9564)
BEAM-9455: Environment-sensitive provisioning for Dataflow 
(https://issues.apache.org/jira/browse/BEAM-9455)
BEAM-9293: Python direct runner doesn't emit empty pane when it should 
(https://issues.apache.org/jira/browse/BEAM-9293)
BEAM-8986: SortValues may not work correct for numerical types 
(https://issues.apache.org/jira/browse/BEAM-8986)
BEAM-8985: SortValues should fail if SecondaryKey coder is not 
deterministic (https://issues.apache.org/jira/browse/BEAM-8985)
BEAM-8407: [SQL] Some Hive tests throw NullPointerException, but get marked 
as passing (Direct Runner) (https://issues.apache.org/jira/browse/BEAM-8407)
BEAM-7717: PubsubIO watermark tracking hovers near start of epoch 
(https://issues.apache.org/jira/browse/BEAM-7717)
BEAM-7716: PubsubIO returns empty message bodies for all messages read 
(https://issues.apache.org/jira/browse/BEAM-7716)
BEAM-7195: BigQuery - 404 errors for 'table not found' when using dynamic 
destinations - sometimes, new table fails to get created 
(https://issues.apache.org/jira/browse/BEAM-7195)
BEAM-6839: User reports protobuf ClassChangeError running against 2.6.0 or 
above (https://issues.apache.org/jira/browse/BEAM-6839)
BEAM-6466: KafkaIO doesn't commit offsets while being used as bounded 
source (https://issues.apache.org/jira/browse/BEAM-6466)


Re: [PROPOSAL] Preparing for Beam 2.30.0 release

2021-04-26 Thread Heejong Lee
On Mon, Apr 26, 2021 at 10:24 AM Robert Bradshaw 
wrote:

> Confirming that the cut date is 4/28/2021 (in two days), right?
>

Yes, 2.30.0 branch is scheduled to be cut on April 28.


>
> On Wed, Apr 21, 2021 at 4:41 PM Tomo Suzuki  wrote:
> >
> > Thank you for the preparation!
> >
> > > a few responses that some high priority changes
> >
> > Would you be willing to share the items for visibility?
>
> There are several PRs in flight (or recently merged) to get
> portability working well with Dataflow for this release.
>

We can still cherry-pick them by importance after the branch cut.


>
> >
> > On Wed, Apr 21, 2021 at 7:21 PM Kenneth Knowles  wrote:
> > >
> > > Also the 2.29.0 was re-cut.
> > >
> > > Usually a delay in one release should not delay the next release,
> because each release represents a certain quantity of changes. But in this
> case, the actual quantity of changes is affected by the re-cut, too.
> > >
> > > On Wed, Apr 21, 2021 at 4:12 PM Heejong Lee 
> wrote:
> > >>
> > >> Update on the 2.30.0 branch cut schedule:
> > >>
> > >> I'm thinking of delaying the branch cut a week since I've got a few
> responses that some high priority changes are still ongoing.
> > >>
> > >> The new cut date is April 28.
> > >>
> > >>
> > >> On Tue, Apr 20, 2021 at 6:07 PM Ahmet Altay  wrote:
> > >>>
> > >>> +1 and thank you!
> > >>>
> > >>> On Tue, Apr 20, 2021 at 4:55 PM Heejong Lee 
> wrote:
> > 
> >  Hi All,
> > 
> >  Beam 2.30.0 release is scheduled to be cut on April 21 according to
> the release calendar [1]
> > 
> >  I'd like to volunteer myself to be the release manager for this
> release. I plan on cutting the release branch on the scheduled date.
> > 
> >  Any comments or objections ?
> > 
> >  Thanks,
> >  Heejong
> > 
> >  [1]
> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles
> >
> >
> >
> > --
> > Regards,
> > Tomo
>


Re: BEAM-3713: Moving from nose to pytest

2021-04-26 Thread Udi Meiri
I'm about to merge https://github.com/apache/beam/pull/14481, which
converts 4 postcommit suites to pytest.

job_PostCommit_Python
job_PostCommit_Python_ValidatesContainer_Dataflow
job_PostCommit_Python_ValidatesRunner_Dataflow
job_PostCommit_Python_ValidatesRunner_Flink

Please report on the bug if you have issues (tests not running, missing
logs, missing results in jenkins).
If you have an open PR that adds a new IT that should be running in one of
the above suites, please convert your decorators.

Examples (see PR for more):
@attr('IT') -> @pytest.mark.it_postcommit
@attr('ValidatesRunner') -> @pytest.mark.it_validatesrunner


On Thu, Mar 25, 2021 at 10:45 AM Udi Meiri  wrote:

> Hi Benjamin,
>
> AFAIK nose is only used for integration tests (unit tests were converted
> to pytest a while back).
> These ITs should all be running periodically (except maybe the release
> related ones?).
>
> I would start with selecting one of the Jenkins jobs and converting the
> ITs in it to pytest.
> Good place to start:
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
> I would prioritize converting the Python jobs listed here:
> https://github.com/apache/beam/blob/master/.github/PULL_REQUEST_TEMPLATE.md
>
> There's a fairly old abandoned PR with some ideas:
> https://github.com/apache/beam/pull/7949/files
> Have a look at:
> sdks/python/scripts/run_integration_test.sh
> sdks/python/pytest.ini
> sdks/python/conftest.py
>
> My idea in that PR was to replace the nose @attr('IT') decorators with 1
> or more:
> @pytest.mark.it_postcommit,
> @pytest.mark.no_direct,
> etc.
> These decorators tell nose/pytest which tests to run.
> So if I wanted to run post-commit tests on direct runner I would use this
> pytest flag:
> "-m 'it_postcommit and not no_direct'".
>
>
> On Wed, Mar 24, 2021 at 5:41 PM Ahmet Altay  wrote:
>
>> All PRs look either merged or closed.
>>
>> +Udi Meiri  might have more information about the
>> remaining work.
>>
>> On Wed, Mar 24, 2021 at 5:29 PM Benjamin Gonzalez Delgado <
>> benjamin.gonza...@wizeline.com> wrote:
>>
>>> Hi team,
>>> I am planning to work in BEAM-3713
>>> , but I see there are
>>> PRs related to the task.
>>> Could someone guide me on the work that remains missing regarding the
>>> migration from nose to pytest?
>>> Any guidance on this would be appreciated.
>>>
>>> Thanks!
>>> Benjamin
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *This email and its contents (including any attachments) are being sent
>>> toyou on the condition of confidentiality and may be protected by
>>> legalprivilege. Access to this email by anyone other than the intended
>>> recipientis unauthorized. If you are not the intended recipient, please
>>> immediatelynotify the sender by replying to this message and delete the
>>> materialimmediately from your system. Any further use, dissemination,
>>> distributionor reproduction of this email is strictly prohibited. Further,
>>> norepresentation is made with respect to any content contained in this
>>> email.*
>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Should WindowFn have a mininal Duration?

2021-04-26 Thread Jan Lukavský

Hi Robert and Reuven,

I was not aware that implementing custom windowing logic is that much 
"common" practice. If so, I think that probably makes little sense to 
"force" users specify the minimal duration - though it could be made 
somewhat "user-friendly", but still, it would require some work on user 
side. Maybe I'll rephrase the motivation is actually the ability to 
generate a set of BoundedWindow labels, that cover a specific time 
interval - and does not leave any window behind. This is obviously 
possible only for time-only windows (which is not the case Reuven 
mentioned with "terminating sessions", which are data-sensitive 
windows). Maybe that would boil down to only the set of built-in 
WindowFns? Can we reasonable presume that users would create custom 
windows not sensitive to data? If so, that would seem like a 
generic-type of windows that could be suitable to include to the 
built-in ones?


 Jan

On 4/26/21 8:28 PM, Reuven Lax wrote:
I've often seen custom windowfns with no static minimum duration. e.g. 
a common customization of sessions is to identify a specific "logout" 
event to end the session.


On Mon, Apr 26, 2021 at 11:08 AM Robert Bradshaw > wrote:


I do think minimal window duration is a meaningful concept for
WindowFns, but from the pragmatic perspective I would ask is it useful
enough to require all implementers of WindowFn to specify it (given
that a default value of 0 would not be very useful).

On Mon, Apr 26, 2021 at 10:05 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:
>
> Hi Kenn,
>
> On 4/26/21 5:59 PM, Kenneth Knowles wrote:
>
> In +Reza Rokni's example of looping timers, it is necessary to
"seed" each key, for just the reason you say. The looping timer
itself for a key should be in the global window. The outputs of
the looping timer are windowed.
>
> Yes, exactly.
>
>
> All that said, your example seems possibly easier if you are OK
with no output for windows with no data.
>
> The problem is actually not with windows with no data. But with
windows containing only droppable data. This "toy example" is
interestingly much more complex than I expected. Pretty much due
to the reason, that there is no access to watermark while
processing elements. But yes, there are probably more efficient
ways to solve that, the best option would be to have access to the
input watermark (e.g. at the start of the bundle, that seems to be
well defined, though I understand there is some negative
experience with that approach). But I don't want to discuss the
solutions, actually.
>
> My "motivating example" was merely a motivation for me to ask
this question (and possible one more about side inputs is to
follow :)), but - giving all examples and possible solutions
aside, the question is - is a minimal duration an intrinsic
property of a WindowFn, or not? If yes, I think there are reasons
to include this property into the model. If no, then we can
discuss the reason why is it the case. I see the only problem with
data-driven windows, all other windows are time-based and as such,
probably carry this property. The data-driven WindowFns could have
this property defined as zero. This is not a super critical
request, more of a philosophical discussion.
>
>  Jan
>
> It sounds like you don't actually want to drop the data, yes?
You want to partition elements at some time X that is in the
middle of some event time interval. If I understand your chosen
approach, you could buffer the element w/ metadata and set the
timer in @ProcessElement. It is no problem if the timestamp of the
timer has already passed. It will fire immediately then. In the
@OnTimer you output from the buffer. I think there may be more
efficient ways to achieve this output.
>
> Kenn
>
> On Thu, Apr 22, 2021 at 2:48 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:
>>
>> Hi,
>>
>> I have come across a "problem" while implementing some toy
Pipeline. I
>> would like to split input PCollection into two parts -
droppable data
>> (delayed for more than allowed lateness from the end of the
window) from
>> the rest. I will not go into details, as that is not relevant, the
>> problem is that I need to setup something like "looping timer"
to be
>> able to create state for a window, even when there is no data,
yet (to
>> be able to setup timer for the end of a window, to be able to
recognize
>> droppable data). I would like the solution to be generic, so I
would
>> like to "infer" the duration of the looping timer from the input
>> PCollection. What I would need is to know a _minimal guaranteed
duration
>> of a window that a WindowFn can generate_. I would then setup the
>> looping timer to tick 

Re: Should WindowFn have a mininal Duration?

2021-04-26 Thread Reuven Lax
I've often seen custom windowfns with no static minimum duration. e.g. a
common customization of sessions is to identify a specific "logout" event
to end the session.

On Mon, Apr 26, 2021 at 11:08 AM Robert Bradshaw 
wrote:

> I do think minimal window duration is a meaningful concept for
> WindowFns, but from the pragmatic perspective I would ask is it useful
> enough to require all implementers of WindowFn to specify it (given
> that a default value of 0 would not be very useful).
>
> On Mon, Apr 26, 2021 at 10:05 AM Jan Lukavský  wrote:
> >
> > Hi Kenn,
> >
> > On 4/26/21 5:59 PM, Kenneth Knowles wrote:
> >
> > In +Reza Rokni's example of looping timers, it is necessary to "seed"
> each key, for just the reason you say. The looping timer itself for a key
> should be in the global window. The outputs of the looping timer are
> windowed.
> >
> > Yes, exactly.
> >
> >
> > All that said, your example seems possibly easier if you are OK with no
> output for windows with no data.
> >
> > The problem is actually not with windows with no data. But with windows
> containing only droppable data. This "toy example" is interestingly much
> more complex than I expected. Pretty much due to the reason, that there is
> no access to watermark while processing elements. But yes, there are
> probably more efficient ways to solve that, the best option would be to
> have access to the input watermark (e.g. at the start of the bundle, that
> seems to be well defined, though I understand there is some negative
> experience with that approach). But I don't want to discuss the solutions,
> actually.
> >
> > My "motivating example" was merely a motivation for me to ask this
> question (and possible one more about side inputs is to follow :)), but -
> giving all examples and possible solutions aside, the question is - is a
> minimal duration an intrinsic property of a WindowFn, or not? If yes, I
> think there are reasons to include this property into the model. If no,
> then we can discuss the reason why is it the case. I see the only problem
> with data-driven windows, all other windows are time-based and as such,
> probably carry this property. The data-driven WindowFns could have this
> property defined as zero. This is not a super critical request, more of a
> philosophical discussion.
> >
> >  Jan
> >
> > It sounds like you don't actually want to drop the data, yes? You want
> to partition elements at some time X that is in the middle of some event
> time interval. If I understand your chosen approach, you could buffer the
> element w/ metadata and set the timer in @ProcessElement. It is no problem
> if the timestamp of the timer has already passed. It will fire immediately
> then. In the @OnTimer you output from the buffer. I think there may be more
> efficient ways to achieve this output.
> >
> > Kenn
> >
> > On Thu, Apr 22, 2021 at 2:48 AM Jan Lukavský  wrote:
> >>
> >> Hi,
> >>
> >> I have come across a "problem" while implementing some toy Pipeline. I
> >> would like to split input PCollection into two parts - droppable data
> >> (delayed for more than allowed lateness from the end of the window) from
> >> the rest. I will not go into details, as that is not relevant, the
> >> problem is that I need to setup something like "looping timer" to be
> >> able to create state for a window, even when there is no data, yet (to
> >> be able to setup timer for the end of a window, to be able to recognize
> >> droppable data). I would like the solution to be generic, so I would
> >> like to "infer" the duration of the looping timer from the input
> >> PCollection. What I would need is to know a _minimal guaranteed duration
> >> of a window that a WindowFn can generate_. I would then setup the
> >> looping timer to tick with interval of this minimal duration and that
> >> would guarantee the timer will hit all the windows.
> >>
> >> I could try to infer this duration from the input windowing with some
> >> hackish ways - e.g. using some "instanceof" approach, or by using the
> >> WindowFn to generate set of windows for some fixed timestamp (without
> >> data element) and then infer the time from maxTimestamp of the returned
> >> windows. That would probably break for sliding windows, because the
> >> result would be the duration of the slide, not the duration of the
> >> window (at least when doing naive computation).
> >>
> >> It seems to me, that all WindowFns have such a minimal Duration -
> >> obvious for Fixed Windows, but every other window type seems to have
> >> such property (including Sessions - that is the gap duration). The only
> >> problem would be with data-driven windows, but we don't have currently
> >> strong support for these.
> >>
> >> The question is then - would it make sense to introduce
> >> WindowFn.getMinimalWindowDuration() to the model? Default value could be
> >> zero, which would mean such WindowFn would be unsupported in my
> >> motivating example.
> >>
> >>   Jan
> >>
>


RE: [PROPOSAL] Upgrade Cassandra driver from 3.x to 4.x in CassandraIO

2021-04-26 Thread S Bhandiwad, Satwik (Nokia - IN/Bangalore)
Hi,

Point 1: I have updated the impact section of the design doc with all the 
breaking changes for users. 
link

Point 2: We have ran only Integration Test, let us know if you have some 
suggestions we'll try to do it.

Regards,
Satwik



From: Alexey Romanenko 
Sent: Thursday, April 22, 2021 7:58 PM
To: dev@beam.apache.org
Subject: Re: [PROPOSAL] Upgrade Cassandra driver from 3.x to 4.x in CassandraIO

Thanks, it looks promising!

I just have a couple things to ask.

1) Could you briefly summarise and add here or/and to design doc all breaking 
changes for users that you expect (if any)? Can we avoid them, at least, maybe 
temporary? For example, we used to deprecate an old public API and keep it for 
the next three Beam releases before removing it completely.

2) Also, did you run any load tests to compare the performance between two 
driver versions for the same pipeline and datasets? If yes, could you share the 
results, please?


--
Alexey


On 20 Apr 2021, at 07:47, D, Anup (Nokia - IN/Bangalore) 
mailto:anu...@nokia.com>> wrote:

Hi All,

Satwik and myself have been working together on this.
4.x has been a major revamp and we have highlighted below major differences 
that were seen during this activity.
Please review and provide feedback.


  1.  Package names :
3.x : com.datastax.cassandra
4.x : com.datastax.oss
Comment : 4.x is different from 3.x. We think both can co-exist. Please see 
JanusGraph who have included both the packages for reference [1]


  1.  Mapping :
3.x : Default Object Mapper took care of mapping all Entity types at runtime - 
org.apache.beam.sdk.io.cassandra.DefaultObjectMapper
4.x : Mapper auto-generates helper classes during compile time by processing 
annotations on Mapper,Dao and Entity. Then, use either a specific Dao or 
Generic Dao to access/map classes.[2][3]
Comment : With objective to avoid/limit breaking changes, we could find 
providing a Generic/Base Dao via inheritance has limited breakage.[4]
Impacts :

a.   Requires mapperFactoryFunction to be mandatorily supplied that can 
return SpecificDao reference.

b.   @GetEntity is the annotation that maps ResultSet to Entity which 
performs strict column checking among the two. This was not the case in 3.x. We 
had posted query to Cassandra community [5]


  1.  HadoopFormatIO
Unit test in HadoopFormatIO that interacts with Cassandra failed when driver 
was upgraded to 4.x. Latest Apache Cassandra server still uses 3.x Cassandra 
connector.
There is an open JIRA [6][7]


  1.  Load Balancing policy
3.x : Providing data center name is optional.
4.x : Load balancing policies have been revamped. Providing data center name is 
mandatory.[8]


  1.  Configuration
3.x : This was done by configuring classes.
4.x : Along with configuring classes, file-based configuration is supported. 
[9][10]
Comment : We did test loading some part of configuration via file and some 
programmatically. There is no impact as such but this is a new complimenting 
feature .


  1.  Driver compatibility
Cassandra 4.5+ drivers are fully compatible with Apache Cassandra 2.1+ 
versions.[11]
The open source driver implementatation “com.datastax.oss” will be supported 
for interacting with Open source, commercial Cassandra
 There is no impact but highlighting

[1] Update Cassandra driver to 4.x version · Issue #1510 · 
JanusGraph/janusgraph 
(github.com)
[2] 
https://stackoverflow.com/questions/34701817/what-is-the-most-efficient-way-to-map-transform-cast-a-cassandra-boundstatement
[3] 
https://docs.datastax.com/en/developer/java-driver/4.5/upgrade_guide/#object-mapper
[4] 
https://stackoverflow.com/questions/61298743/genericdao-on-datastax-java-driver-4
[5] cassandra - Strict column checking in Datastax java driver 4 causing 
problems - Stack 
Overflow
[6] https://issues.apache.org/jira/browse/CASSANDRA-15750
[7] 
https://javadoc.io/doc/org.apache.cassandra/cassandra-all/latest/org/apache/cassandra/hadoop/cql3/CqlInputFormat.html
[8] 
https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/load_balancing/
[9] 
https://github.com/datastax/java-driver/tree/4.0.0/upgrade_guide#configuration
[10] 
https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/configuration/
[11] 
https://docs.datastax.com/en/driver-matrix/doc/driver_matrix/javaDrivers.html


Thanks
Anup

From: Alexey Romanenko 
mailto:aromanenko@gmail.com>>
Sent: Friday, April 16, 2021 11:02 PM
To: dev@beam.apache.org
Subject: Re: [PROPOSAL] Upgrade Cassandra driver from 3.x to 4.x in CassandraIO

Thank you for design doc and starting a discussion on mailing list!

I’m the next after Kenn to ask about the potential breaking changes with this 
upgrade. Could you elaborate 

Re: Should WindowFn have a mininal Duration?

2021-04-26 Thread Robert Bradshaw
I do think minimal window duration is a meaningful concept for
WindowFns, but from the pragmatic perspective I would ask is it useful
enough to require all implementers of WindowFn to specify it (given
that a default value of 0 would not be very useful).

On Mon, Apr 26, 2021 at 10:05 AM Jan Lukavský  wrote:
>
> Hi Kenn,
>
> On 4/26/21 5:59 PM, Kenneth Knowles wrote:
>
> In +Reza Rokni's example of looping timers, it is necessary to "seed" each 
> key, for just the reason you say. The looping timer itself for a key should 
> be in the global window. The outputs of the looping timer are windowed.
>
> Yes, exactly.
>
>
> All that said, your example seems possibly easier if you are OK with no 
> output for windows with no data.
>
> The problem is actually not with windows with no data. But with windows 
> containing only droppable data. This "toy example" is interestingly much more 
> complex than I expected. Pretty much due to the reason, that there is no 
> access to watermark while processing elements. But yes, there are probably 
> more efficient ways to solve that, the best option would be to have access to 
> the input watermark (e.g. at the start of the bundle, that seems to be well 
> defined, though I understand there is some negative experience with that 
> approach). But I don't want to discuss the solutions, actually.
>
> My "motivating example" was merely a motivation for me to ask this question 
> (and possible one more about side inputs is to follow :)), but - giving all 
> examples and possible solutions aside, the question is - is a minimal 
> duration an intrinsic property of a WindowFn, or not? If yes, I think there 
> are reasons to include this property into the model. If no, then we can 
> discuss the reason why is it the case. I see the only problem with 
> data-driven windows, all other windows are time-based and as such, probably 
> carry this property. The data-driven WindowFns could have this property 
> defined as zero. This is not a super critical request, more of a 
> philosophical discussion.
>
>  Jan
>
> It sounds like you don't actually want to drop the data, yes? You want to 
> partition elements at some time X that is in the middle of some event time 
> interval. If I understand your chosen approach, you could buffer the element 
> w/ metadata and set the timer in @ProcessElement. It is no problem if the 
> timestamp of the timer has already passed. It will fire immediately then. In 
> the @OnTimer you output from the buffer. I think there may be more efficient 
> ways to achieve this output.
>
> Kenn
>
> On Thu, Apr 22, 2021 at 2:48 AM Jan Lukavský  wrote:
>>
>> Hi,
>>
>> I have come across a "problem" while implementing some toy Pipeline. I
>> would like to split input PCollection into two parts - droppable data
>> (delayed for more than allowed lateness from the end of the window) from
>> the rest. I will not go into details, as that is not relevant, the
>> problem is that I need to setup something like "looping timer" to be
>> able to create state for a window, even when there is no data, yet (to
>> be able to setup timer for the end of a window, to be able to recognize
>> droppable data). I would like the solution to be generic, so I would
>> like to "infer" the duration of the looping timer from the input
>> PCollection. What I would need is to know a _minimal guaranteed duration
>> of a window that a WindowFn can generate_. I would then setup the
>> looping timer to tick with interval of this minimal duration and that
>> would guarantee the timer will hit all the windows.
>>
>> I could try to infer this duration from the input windowing with some
>> hackish ways - e.g. using some "instanceof" approach, or by using the
>> WindowFn to generate set of windows for some fixed timestamp (without
>> data element) and then infer the time from maxTimestamp of the returned
>> windows. That would probably break for sliding windows, because the
>> result would be the duration of the slide, not the duration of the
>> window (at least when doing naive computation).
>>
>> It seems to me, that all WindowFns have such a minimal Duration -
>> obvious for Fixed Windows, but every other window type seems to have
>> such property (including Sessions - that is the gap duration). The only
>> problem would be with data-driven windows, but we don't have currently
>> strong support for these.
>>
>> The question is then - would it make sense to introduce
>> WindowFn.getMinimalWindowDuration() to the model? Default value could be
>> zero, which would mean such WindowFn would be unsupported in my
>> motivating example.
>>
>>   Jan
>>


Re: Portable Java Pipeline Support

2021-04-26 Thread Kyle Weaver
> For Samza Runner, we are looking to leverage java portable mode to
achieve “split deployment” where runner is independently packaged w/o user
code and user code should only exist in the submission/worker process. I
believe this is supported by portable mode and therefore we would prefer to
use LOOPBACK (for testing) and DOCKER (for production) mode.

Makes sense.

> Is there a way to get BEAM-12227
 prioritized or the
fastest way is to patch it ourselves?

Probably to patch it yourselves. I'd be happy to provide a review if you
need it though.

On Mon, Apr 26, 2021 at 10:47 AM Ke Wu  wrote:

> That makes sense.
>
> For Samza Runner, we are looking to leverage java portable mode to achieve
> “split deployment” where runner is independently packaged w/o user code and
> user code should only exist in the submission/worker process. I believe
> this is supported by portable mode and therefore we would prefer to use
> LOOPBACK (for testing) and DOCKER (for production) mode.
>
> Is there a way to get BEAM-12227
>  prioritized or the
> fastest way is to patch it ourselves?
>
> Best,
> Ke
>
>
> On Apr 26, 2021, at 10:17 AM, Kyle Weaver  wrote:
>
> The reason is the Flink and Spark runners are written in Java. So when the
> runner needs to execute user code written in Java, an EMBEDDED environment
> can be started in the runner. Whereas the runner cannot natively execute
> Python code, so it needs to call out to an external process. In the case of
> LOOPBACK, that external process is started by the Python client process
> that submitted the job in the first place.
>
> On Mon, Apr 26, 2021 at 9:57 AM Ke Wu  wrote:
>
>> Thank you Kyle, I have created BEAM-12227
>>  to track the
>> unimplemented exception.
>>
>> Is there any specific reason that Java tests are using EMBEDDED mode
>> while python usually in LOOPBACK mode?
>>
>> Best,
>> Ke
>>
>> On Apr 23, 2021, at 4:01 PM, Kyle Weaver  wrote:
>>
>> I couldn't find any existing ticket for this issue (you may be the first
>> to discover it). Feel free to create one with your findings. (FWIW I did
>> find a ticket for documenting portable Java pipelines [1]).
>>
>> For the Flink and Spark runners, we run most of our Java tests using
>> EMBEDDED mode. For portable Samza, you will likely want to use a similar
>> setup [2].
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-11062
>> [2]
>> https://github.com/apache/beam/blob/247915c66f6206249c31e6160b8b605a013d9f04/runners/spark/job-server/spark_job_server.gradle#L186
>>
>> On Fri, Apr 23, 2021 at 3:25 PM Ke Wu  wrote:
>>
>>> Thank you, Kyle, for the detailed answer.
>>>
>>> Do we have a ticket track fix the LOOPBACK mode? LOOPBACK mode will be
>>> essential, especially for local testing as Samza Runner adopts portable
>>> mode and we are intended to run it with Java pipeline a lot.
>>>
>>> In addition, I noticed that this issue does not happen every time
>>> LOOPBACK is used, for example:
>>>
>>> Pipeline p = Pipeline.create(options);
>>>
>>> p.apply(Create.of(KV.of("1", 1L), KV.of("1", 2L), KV.of("2", 2L), 
>>> KV.of("2", 3L), KV.of("3", 9L)))
>>> .apply(Sum.longsPerKey())
>>> .apply(MapElements.via(new PrintFn()));
>>>
>>> p.run().waitUntilFinish();
>>>
>>> Where PrintFn simply prints the result:
>>>
>>> public static class PrintFn extends SimpleFunction, 
>>> String> {
>>>   @Override
>>>   public String apply(KV input) {
>>> LOG.info("Key {}: value {}", input.getKey(), input.getValue());
>>> return input.getKey() + ": " + input.getValue();
>>>   }
>>> }
>>>
>>>
>>> This simple pipeline did work in Java LOOPBACK mode.
>>>
>>> Best,
>>> Ke
>>>
>>> On Apr 23, 2021, at 1:16 PM, Kyle Weaver  wrote:
>>>
>>> Yes, we can expect to run java pipelines in portable mode. I'm guessing
>>> the method unimplemented exception is a bug, and we haven't caught it
>>> because (as far as I know) we don't test the Java loopback worker.
>>>
>>> As an alternative, you can try building the Java docker environment with
>>> "./gradlew :sdks:java:container:java8:docker" and then use
>>> "--defaultEnvironmentType=DOCKER" in your pipeline. But note that you won't
>>> be able to access the host filesystem [1].
>>>
>>> Another alternative is "--defaultEnvironmentType=EMBEDDED", but the
>>> embedded environment assumes the dependencies are already present on the
>>> runner, which will not be the case unless you modify the job server to
>>> depend on the examples module.
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-5440
>>>
>>> On Fri, Apr 23, 2021 at 11:24 AM Ke Wu  wrote:
>>>
 Hi All,

 I am working on add portability support for Samza Runner and having
 been playing around on the support in Flink and Spark runner recently.

 One thing I noticed is the lack of documentation on how to run a java
 pipeline in a portable mode. Almost all document 

Re: Portable Java Pipeline Support

2021-04-26 Thread Ke Wu
That makes sense. 

For Samza Runner, we are looking to leverage java portable mode to achieve 
“split deployment” where runner is independently packaged w/o user code and 
user code should only exist in the submission/worker process. I believe this is 
supported by portable mode and therefore we would prefer to use LOOPBACK (for 
testing) and DOCKER (for production) mode.

Is there a way to get BEAM-12227 
 prioritized or the fastest 
way is to patch it ourselves?

Best,
Ke


> On Apr 26, 2021, at 10:17 AM, Kyle Weaver  wrote:
> 
> The reason is the Flink and Spark runners are written in Java. So when the 
> runner needs to execute user code written in Java, an EMBEDDED environment 
> can be started in the runner. Whereas the runner cannot natively execute 
> Python code, so it needs to call out to an external process. In the case of 
> LOOPBACK, that external process is started by the Python client process that 
> submitted the job in the first place.
> 
> On Mon, Apr 26, 2021 at 9:57 AM Ke Wu  > wrote:
> Thank you Kyle, I have created BEAM-12227 
>  to track the unimplemented 
> exception.
> 
> Is there any specific reason that Java tests are using EMBEDDED mode while 
> python usually in LOOPBACK mode?
> 
> Best,
> Ke
> 
>> On Apr 23, 2021, at 4:01 PM, Kyle Weaver > > wrote:
>> 
>> I couldn't find any existing ticket for this issue (you may be the first to 
>> discover it). Feel free to create one with your findings. (FWIW I did find a 
>> ticket for documenting portable Java pipelines [1]).
>> 
>> For the Flink and Spark runners, we run most of our Java tests using 
>> EMBEDDED mode. For portable Samza, you will likely want to use a similar 
>> setup [2].
>> 
>> [1] https://issues.apache.org/jira/browse/BEAM-11062 
>> 
>> [2] 
>> https://github.com/apache/beam/blob/247915c66f6206249c31e6160b8b605a013d9f04/runners/spark/job-server/spark_job_server.gradle#L186
>>  
>> 
>> On Fri, Apr 23, 2021 at 3:25 PM Ke Wu > > wrote:
>> Thank you, Kyle, for the detailed answer. 
>> 
>> Do we have a ticket track fix the LOOPBACK mode? LOOPBACK mode will be 
>> essential, especially for local testing as Samza Runner adopts portable mode 
>> and we are intended to run it with Java pipeline a lot.
>> 
>> In addition, I noticed that this issue does not happen every time LOOPBACK 
>> is used, for example:
>> Pipeline p = Pipeline.create(options);
>> 
>> p.apply(Create.of(KV.of("1", 1L), KV.of("1", 2L), KV.of("2", 2L), KV.of("2", 
>> 3L), KV.of("3", 9L)))
>> .apply(Sum.longsPerKey())
>> .apply(MapElements.via(new PrintFn()));
>> 
>> p.run().waitUntilFinish();
>> Where PrintFn simply prints the result:
>> public static class PrintFn extends SimpleFunction, String> 
>> {
>>   @Override
>>   public String apply(KV input) {
>> LOG.info("Key {}: value {}", input.getKey(), input.getValue());
>> return input.getKey() + ": " + input.getValue();
>>   }
>> }
>> 
>> This simple pipeline did work in Java LOOPBACK mode.
>> 
>> Best,
>> Ke
>> 
>>> On Apr 23, 2021, at 1:16 PM, Kyle Weaver >> > wrote:
>>> 
>>> Yes, we can expect to run java pipelines in portable mode. I'm guessing the 
>>> method unimplemented exception is a bug, and we haven't caught it because 
>>> (as far as I know) we don't test the Java loopback worker.
>>> 
>>> As an alternative, you can try building the Java docker environment with 
>>> "./gradlew :sdks:java:container:java8:docker" and then use 
>>> "--defaultEnvironmentType=DOCKER" in your pipeline. But note that you won't 
>>> be able to access the host filesystem [1].
>>> 
>>> Another alternative is "--defaultEnvironmentType=EMBEDDED", but the 
>>> embedded environment assumes the dependencies are already present on the 
>>> runner, which will not be the case unless you modify the job server to 
>>> depend on the examples module.
>>> 
>>> [1] https://issues.apache.org/jira/browse/BEAM-5440 
>>> 
>>> On Fri, Apr 23, 2021 at 11:24 AM Ke Wu >> > wrote:
>>> Hi All,
>>> 
>>> I am working on add portability support for Samza Runner and having been 
>>> playing around on the support in Flink and Spark runner recently. 
>>> 
>>> One thing I noticed is the lack of documentation on how to run a java 
>>> pipeline in a portable mode. Almost all document focuses on how to run a 
>>> python pipeline, which is understandable. I believe a java pipeline can be 
>>> executed in portable mode as well so I did some experiments but results are 
>>> not expected and would like to know if they are expected:
>>> 
>>> 
>>> 1. Add portability module to example 

Re: [PROPOSAL] Preparing for Beam 2.30.0 release

2021-04-26 Thread Robert Bradshaw
Confirming that the cut date is 4/28/2021 (in two days), right?

On Wed, Apr 21, 2021 at 4:41 PM Tomo Suzuki  wrote:
>
> Thank you for the preparation!
>
> > a few responses that some high priority changes
>
> Would you be willing to share the items for visibility?

There are several PRs in flight (or recently merged) to get
portability working well with Dataflow for this release.

>
> On Wed, Apr 21, 2021 at 7:21 PM Kenneth Knowles  wrote:
> >
> > Also the 2.29.0 was re-cut.
> >
> > Usually a delay in one release should not delay the next release, because 
> > each release represents a certain quantity of changes. But in this case, 
> > the actual quantity of changes is affected by the re-cut, too.
> >
> > On Wed, Apr 21, 2021 at 4:12 PM Heejong Lee  wrote:
> >>
> >> Update on the 2.30.0 branch cut schedule:
> >>
> >> I'm thinking of delaying the branch cut a week since I've got a few 
> >> responses that some high priority changes are still ongoing.
> >>
> >> The new cut date is April 28.
> >>
> >>
> >> On Tue, Apr 20, 2021 at 6:07 PM Ahmet Altay  wrote:
> >>>
> >>> +1 and thank you!
> >>>
> >>> On Tue, Apr 20, 2021 at 4:55 PM Heejong Lee  wrote:
> 
>  Hi All,
> 
>  Beam 2.30.0 release is scheduled to be cut on April 21 according to the 
>  release calendar [1]
> 
>  I'd like to volunteer myself to be the release manager for this release. 
>  I plan on cutting the release branch on the scheduled date.
> 
>  Any comments or objections ?
> 
>  Thanks,
>  Heejong
> 
>  [1] 
>  https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles
>
>
>
> --
> Regards,
> Tomo


Re: Portable Java Pipeline Support

2021-04-26 Thread Kyle Weaver
The reason is the Flink and Spark runners are written in Java. So when the
runner needs to execute user code written in Java, an EMBEDDED environment
can be started in the runner. Whereas the runner cannot natively execute
Python code, so it needs to call out to an external process. In the case of
LOOPBACK, that external process is started by the Python client process
that submitted the job in the first place.

On Mon, Apr 26, 2021 at 9:57 AM Ke Wu  wrote:

> Thank you Kyle, I have created BEAM-12227
>  to track the
> unimplemented exception.
>
> Is there any specific reason that Java tests are using EMBEDDED mode while
> python usually in LOOPBACK mode?
>
> Best,
> Ke
>
> On Apr 23, 2021, at 4:01 PM, Kyle Weaver  wrote:
>
> I couldn't find any existing ticket for this issue (you may be the first
> to discover it). Feel free to create one with your findings. (FWIW I did
> find a ticket for documenting portable Java pipelines [1]).
>
> For the Flink and Spark runners, we run most of our Java tests using
> EMBEDDED mode. For portable Samza, you will likely want to use a similar
> setup [2].
>
> [1] https://issues.apache.org/jira/browse/BEAM-11062
> [2]
> https://github.com/apache/beam/blob/247915c66f6206249c31e6160b8b605a013d9f04/runners/spark/job-server/spark_job_server.gradle#L186
>
> On Fri, Apr 23, 2021 at 3:25 PM Ke Wu  wrote:
>
>> Thank you, Kyle, for the detailed answer.
>>
>> Do we have a ticket track fix the LOOPBACK mode? LOOPBACK mode will be
>> essential, especially for local testing as Samza Runner adopts portable
>> mode and we are intended to run it with Java pipeline a lot.
>>
>> In addition, I noticed that this issue does not happen every time
>> LOOPBACK is used, for example:
>>
>> Pipeline p = Pipeline.create(options);
>>
>> p.apply(Create.of(KV.of("1", 1L), KV.of("1", 2L), KV.of("2", 2L), KV.of("2", 
>> 3L), KV.of("3", 9L)))
>> .apply(Sum.longsPerKey())
>> .apply(MapElements.via(new PrintFn()));
>>
>> p.run().waitUntilFinish();
>>
>> Where PrintFn simply prints the result:
>>
>> public static class PrintFn extends SimpleFunction, String> 
>> {
>>   @Override
>>   public String apply(KV input) {
>> LOG.info("Key {}: value {}", input.getKey(), input.getValue());
>> return input.getKey() + ": " + input.getValue();
>>   }
>> }
>>
>>
>> This simple pipeline did work in Java LOOPBACK mode.
>>
>> Best,
>> Ke
>>
>> On Apr 23, 2021, at 1:16 PM, Kyle Weaver  wrote:
>>
>> Yes, we can expect to run java pipelines in portable mode. I'm guessing
>> the method unimplemented exception is a bug, and we haven't caught it
>> because (as far as I know) we don't test the Java loopback worker.
>>
>> As an alternative, you can try building the Java docker environment with
>> "./gradlew :sdks:java:container:java8:docker" and then use
>> "--defaultEnvironmentType=DOCKER" in your pipeline. But note that you won't
>> be able to access the host filesystem [1].
>>
>> Another alternative is "--defaultEnvironmentType=EMBEDDED", but the
>> embedded environment assumes the dependencies are already present on the
>> runner, which will not be the case unless you modify the job server to
>> depend on the examples module.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-5440
>>
>> On Fri, Apr 23, 2021 at 11:24 AM Ke Wu  wrote:
>>
>>> Hi All,
>>>
>>> I am working on add portability support for Samza Runner and having been
>>> playing around on the support in Flink and Spark runner recently.
>>>
>>> One thing I noticed is the lack of documentation on how to run a java
>>> pipeline in a portable mode. Almost all document focuses on how to run a
>>> python pipeline, which is understandable. I believe a java pipeline can be
>>> executed in portable mode as well so I did some experiments but results are
>>> not expected and would like to know if they are expected:
>>>
>>>
>>> 1. Add portability module to example so PipelineOptionsFactory can
>>> recognize PortableRunner:
>>>
>>> $ git diff
>>> diff --git a/examples/java/build.gradle b/examples/java/build.gradle
>>> index 62f15ec24b..c9069d3f4f 100644
>>> --- a/examples/java/build.gradle
>>> +++ b/examples/java/build.gradle
>>> @@ -59,6 +59,7 @@ dependencies {
>>>compile project(":sdks:java:extensions:google-cloud-platform-core")
>>>compile project(":sdks:java:io:google-cloud-platform")
>>>compile project(":sdks:java:io:kafka")
>>> +  compile project(":runners:portability:java")
>>>compile project(":sdks:java:extensions:ml")
>>>compile library.java.avro
>>>compile library.java.bigdataoss_util
>>>
>>>
>>> 2. Bring up the Job Server:
>>>
>>> Spark: ./gradlew :runners:spark:3:job-server:runShadow
>>> Flink: ./gradlew :runners:flink:1.12:job-server:runShadow
>>>
>>> 3. Execute WordCount example:
>>>
>>> ./gradlew execute -DmainClass=org.apache.beam.examples.WordCount
>>> -Dexec.args="--inputFile=README.md --output=/tmp/output
>>> --runner=PortableRunner --jobEndpoint=localhost:8099
>>> 

Re: Contributor permissions for Beam Jira tickets

2021-04-26 Thread Ismaël Mejía
Done, you are now a contributor and I assigned BEAM-12225 to you.
Welcome to Beam and don't feel bad about your accents I have to deal with
the same issues regularly :)

Regards,
Ismaël Mejía


On Mon, Apr 26, 2021 at 6:22 PM Rafal Ochyra  wrote:

> Hi,
>
> I have created the account on Beam Jira to report an issue that I would
> like to work on. Link to created issue:
> https://issues.apache.org/jira/browse/BEAM-12225.
> My username is: Rafał Ochyra. Forgive me for using "ł" and space in
> there... Maybe it would be worth changing it if possible.
> Could you please add me as a contributor in the Beam issue tracker so I
> will be able to assign myself to this task (after triage) and, potentially,
> other tasks in the future?
>
> Best regards,
> Rafał
>
> Notice:
> This email is confidential and may contain copyright material of members
> of the Ocado Group. Opinions and views expressed in this message may not
> necessarily reflect the opinions and views of the members of the Ocado
> Group.
>
> If you are not the intended recipient, please notify us immediately and
> delete all copies of this message. Please note that it is your
> responsibility to scan this message for viruses.
>
> References to the "Ocado Group" are to Ocado Group plc (registered in
> England and Wales with number 7098618) and its subsidiary undertakings (as
> that expression is defined in the Companies Act 2006) from time to time.
> The registered office of Ocado Group plc is Buildings One & Two, Trident
> Place, Mosquito Way, Hatfield, Hertfordshire, AL10 9UL.
>


Re: Should WindowFn have a mininal Duration?

2021-04-26 Thread Jan Lukavský

Hi Kenn,

On 4/26/21 5:59 PM, Kenneth Knowles wrote:
In +Reza Rokni 's example of looping 
timers, it is necessary to "seed" each key, for just the reason you 
say. The looping timer itself for a key should be in the global 
window. The outputs of the looping timer are windowed.

Yes, exactly.


All that said, your example seems possibly easier if you are OK with 
no output for windows with no data.


The problem is actually not with windows with no data. But with windows 
containing only droppable data. This "toy example" is interestingly much 
more complex than I expected. Pretty much due to the reason, that there 
is no access to watermark while processing elements. But yes, there are 
probably more efficient ways to solve that, the best option would be to 
have access to the input watermark (e.g. at the start of the bundle, 
that seems to be well defined, though I understand there is some 
negative experience with that approach). But I don't want to discuss the 
solutions, actually.


My "motivating example" was merely a motivation for me to ask this 
question (and possible one more about side inputs is to follow :)), but 
- giving all examples and possible solutions aside, the question is - is 
a minimal duration an intrinsic property of a WindowFn, or not? If yes, 
I think there are reasons to include this property into the model. If 
no, then we can discuss the reason why is it the case. I see the only 
problem with data-driven windows, all other windows are time-based and 
as such, probably carry this property. The data-driven WindowFns could 
have this property defined as zero. This is not a super critical 
request, more of a philosophical discussion.


 Jan

It sounds like you don't actually want to drop the data, yes? You want 
to partition elements at some time X that is in the middle of some 
event time interval. If I understand your chosen approach, you could 
buffer the element w/ metadata and set the timer in @ProcessElement. 
It is no problem if the timestamp of the timer has already passed. It 
will fire immediately then. In the @OnTimer you output from the 
buffer. I think there may be more efficient ways to achieve this output.


Kenn

On Thu, Apr 22, 2021 at 2:48 AM Jan Lukavský > wrote:


Hi,

I have come across a "problem" while implementing some toy
Pipeline. I
would like to split input PCollection into two parts - droppable data
(delayed for more than allowed lateness from the end of the
window) from
the rest. I will not go into details, as that is not relevant, the
problem is that I need to setup something like "looping timer" to be
able to create state for a window, even when there is no data, yet
(to
be able to setup timer for the end of a window, to be able to
recognize
droppable data). I would like the solution to be generic, so I would
like to "infer" the duration of the looping timer from the input
PCollection. What I would need is to know a _minimal guaranteed
duration
of a window that a WindowFn can generate_. I would then setup the
looping timer to tick with interval of this minimal duration and that
would guarantee the timer will hit all the windows.

I could try to infer this duration from the input windowing with some
hackish ways - e.g. using some "instanceof" approach, or by using the
WindowFn to generate set of windows for some fixed timestamp (without
data element) and then infer the time from maxTimestamp of the
returned
windows. That would probably break for sliding windows, because the
result would be the duration of the slide, not the duration of the
window (at least when doing naive computation).

It seems to me, that all WindowFns have such a minimal Duration -
obvious for Fixed Windows, but every other window type seems to have
such property (including Sessions - that is the gap duration). The
only
problem would be with data-driven windows, but we don't have
currently
strong support for these.

The question is then - would it make sense to introduce
WindowFn.getMinimalWindowDuration() to the model? Default value
could be
zero, which would mean such WindowFn would be unsupported in my
motivating example.

  Jan



Re: Portable Java Pipeline Support

2021-04-26 Thread Ke Wu
Thank you Kyle, I have created BEAM-12227 
 to track the unimplemented 
exception.

Is there any specific reason that Java tests are using EMBEDDED mode while 
python usually in LOOPBACK mode?

Best,
Ke

> On Apr 23, 2021, at 4:01 PM, Kyle Weaver  wrote:
> 
> I couldn't find any existing ticket for this issue (you may be the first to 
> discover it). Feel free to create one with your findings. (FWIW I did find a 
> ticket for documenting portable Java pipelines [1]).
> 
> For the Flink and Spark runners, we run most of our Java tests using EMBEDDED 
> mode. For portable Samza, you will likely want to use a similar setup [2].
> 
> [1] https://issues.apache.org/jira/browse/BEAM-11062 
> 
> [2] 
> https://github.com/apache/beam/blob/247915c66f6206249c31e6160b8b605a013d9f04/runners/spark/job-server/spark_job_server.gradle#L186
>  
> 
> On Fri, Apr 23, 2021 at 3:25 PM Ke Wu  > wrote:
> Thank you, Kyle, for the detailed answer. 
> 
> Do we have a ticket track fix the LOOPBACK mode? LOOPBACK mode will be 
> essential, especially for local testing as Samza Runner adopts portable mode 
> and we are intended to run it with Java pipeline a lot.
> 
> In addition, I noticed that this issue does not happen every time LOOPBACK is 
> used, for example:
> Pipeline p = Pipeline.create(options);
> 
> p.apply(Create.of(KV.of("1", 1L), KV.of("1", 2L), KV.of("2", 2L), KV.of("2", 
> 3L), KV.of("3", 9L)))
> .apply(Sum.longsPerKey())
> .apply(MapElements.via(new PrintFn()));
> 
> p.run().waitUntilFinish();
> Where PrintFn simply prints the result:
> public static class PrintFn extends SimpleFunction, String> {
>   @Override
>   public String apply(KV input) {
> LOG.info("Key {}: value {}", input.getKey(), input.getValue());
> return input.getKey() + ": " + input.getValue();
>   }
> }
> 
> This simple pipeline did work in Java LOOPBACK mode.
> 
> Best,
> Ke
> 
>> On Apr 23, 2021, at 1:16 PM, Kyle Weaver > > wrote:
>> 
>> Yes, we can expect to run java pipelines in portable mode. I'm guessing the 
>> method unimplemented exception is a bug, and we haven't caught it because 
>> (as far as I know) we don't test the Java loopback worker.
>> 
>> As an alternative, you can try building the Java docker environment with 
>> "./gradlew :sdks:java:container:java8:docker" and then use 
>> "--defaultEnvironmentType=DOCKER" in your pipeline. But note that you won't 
>> be able to access the host filesystem [1].
>> 
>> Another alternative is "--defaultEnvironmentType=EMBEDDED", but the embedded 
>> environment assumes the dependencies are already present on the runner, 
>> which will not be the case unless you modify the job server to depend on the 
>> examples module.
>> 
>> [1] https://issues.apache.org/jira/browse/BEAM-5440 
>> 
>> On Fri, Apr 23, 2021 at 11:24 AM Ke Wu > > wrote:
>> Hi All,
>> 
>> I am working on add portability support for Samza Runner and having been 
>> playing around on the support in Flink and Spark runner recently. 
>> 
>> One thing I noticed is the lack of documentation on how to run a java 
>> pipeline in a portable mode. Almost all document focuses on how to run a 
>> python pipeline, which is understandable. I believe a java pipeline can be 
>> executed in portable mode as well so I did some experiments but results are 
>> not expected and would like to know if they are expected:
>> 
>> 
>> 1. Add portability module to example so PipelineOptionsFactory can recognize 
>> PortableRunner:
>> $ git diff
>> diff --git a/examples/java/build.gradle b/examples/java/build.gradle
>> index 62f15ec24b..c9069d3f4f 100644
>> --- a/examples/java/build.gradle
>> +++ b/examples/java/build.gradle
>> @@ -59,6 +59,7 @@ dependencies {
>>compile project(":sdks:java:extensions:google-cloud-platform-core")
>>compile project(":sdks:java:io:google-cloud-platform")
>>compile project(":sdks:java:io:kafka")
>> +  compile project(":runners:portability:java")
>>compile project(":sdks:java:extensions:ml")
>>compile library.java.avro
>>compile library.java.bigdataoss_util
>> 
>> 2. Bring up the Job Server: 
>> 
>>  Spark: ./gradlew :runners:spark:3:job-server:runShadow
>>  Flink: ./gradlew :runners:flink:1.12:job-server:runShadow
>> 
>> 3. Execute WordCount example:
>> 
>> ./gradlew execute -DmainClass=org.apache.beam.examples.WordCount 
>> -Dexec.args="--inputFile=README.md --output=/tmp/output 
>> --runner=PortableRunner --jobEndpoint=localhost:8099 
>> --defaultEnvironmentType=LOOPBACK"
>> 
>> 
>> Neither Flink or Spark runner worked for WordCount because of 
>> 
>> org.apache.beam.vendor.grpc.v1p26p0.io.grpc.StatusRuntimeException: 
>> 

Contributor permissions for Beam Jira tickets

2021-04-26 Thread Rafal Ochyra
Hi,

I have created the account on Beam Jira to report an issue that I would
like to work on. Link to created issue:
https://issues.apache.org/jira/browse/BEAM-12225.
My username is: Rafał Ochyra. Forgive me for using "ł" and space in
there... Maybe it would be worth changing it if possible.
Could you please add me as a contributor in the Beam issue tracker so I
will be able to assign myself to this task (after triage) and, potentially,
other tasks in the future?

Best regards,
Rafał

-- 


Notice: 
This email is confidential and may contain copyright material of 
members of the Ocado Group. Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the members of the 
Ocado Group.

If you are not the intended recipient, please notify us 
immediately and delete all copies of this message. Please note that it is 
your responsibility to scan this message for viruses.

References to the 
"Ocado Group" are to Ocado Group plc (registered in England and Wales with 
number 7098618) and its subsidiary undertakings (as that expression is 
defined in the Companies Act 2006) from time to time. The registered office 
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way, 
Hatfield, Hertfordshire, AL10 9UL.


Re: Should WindowFn have a mininal Duration?

2021-04-26 Thread Kenneth Knowles
In +Reza Rokni 's example of looping timers, it is
necessary to "seed" each key, for just the reason you say. The looping
timer itself for a key should be in the global window. The outputs of the
looping timer are windowed.

All that said, your example seems possibly easier if you are OK with no
output for windows with no data. It sounds like you don't actually want to
drop the data, yes? You want to partition elements at some time X that is
in the middle of some event time interval. If I understand your chosen
approach, you could buffer the element w/ metadata and set the timer
in @ProcessElement. It is no problem if the timestamp of the timer has
already passed. It will fire immediately then. In the @OnTimer you output
from the buffer. I think there may be more efficient ways to achieve this
output.

Kenn

On Thu, Apr 22, 2021 at 2:48 AM Jan Lukavský  wrote:

> Hi,
>
> I have come across a "problem" while implementing some toy Pipeline. I
> would like to split input PCollection into two parts - droppable data
> (delayed for more than allowed lateness from the end of the window) from
> the rest. I will not go into details, as that is not relevant, the
> problem is that I need to setup something like "looping timer" to be
> able to create state for a window, even when there is no data, yet (to
> be able to setup timer for the end of a window, to be able to recognize
> droppable data). I would like the solution to be generic, so I would
> like to "infer" the duration of the looping timer from the input
> PCollection. What I would need is to know a _minimal guaranteed duration
> of a window that a WindowFn can generate_. I would then setup the
> looping timer to tick with interval of this minimal duration and that
> would guarantee the timer will hit all the windows.
>
> I could try to infer this duration from the input windowing with some
> hackish ways - e.g. using some "instanceof" approach, or by using the
> WindowFn to generate set of windows for some fixed timestamp (without
> data element) and then infer the time from maxTimestamp of the returned
> windows. That would probably break for sliding windows, because the
> result would be the duration of the slide, not the duration of the
> window (at least when doing naive computation).
>
> It seems to me, that all WindowFns have such a minimal Duration -
> obvious for Fixed Windows, but every other window type seems to have
> such property (including Sessions - that is the gap duration). The only
> problem would be with data-driven windows, but we don't have currently
> strong support for these.
>
> The question is then - would it make sense to introduce
> WindowFn.getMinimalWindowDuration() to the model? Default value could be
> zero, which would mean such WindowFn would be unsupported in my
> motivating example.
>
>   Jan
>
>


Re: Contributor Permission for Beam on Jira

2021-04-26 Thread Jenny Xu
Haha thank you Kenneth!

On Fri, Apr 23, 2021 at 12:52 PM Kenneth Knowles  wrote:

> Welcome! Love the Jira handle.
>
> Kenn
>
> On Fri, Apr 23, 2021 at 8:28 AM Jenny Xu  wrote:
>
>> Thanks Alexey!
>>
>> Weiwen
>>
>> On Fri, Apr 23, 2021 at 11:24 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Done.
>>>
>>> Welcome to Beam, Weiwen!
>>>
>>> ---
>>> Alexey
>>>
>>> On 23 Apr 2021, at 17:16, Jenny Xu  wrote:
>>>
>>> Hi,
>>>
>>> This is Weiwen Xu. Could someone please add me as a contributor for
>>> Beam's Jira issue tracker? My Jira ID is JenX.
>>>
>>> Thanks,
>>> Weiwen
>>>
>>>
>>>


[RESULT] [VOTE] Release 2.29.0, release candidate #1

2021-04-26 Thread Kenneth Knowles
I'm happy to announce that we have approved the 2.29.0 release.

There are 11 approving votes, 6 of which are binding. There are no
disapproving votes.

Binding votes:
 - Ahmet Altay
 - Robert Bradshaw
 - Pablo Estrada
 - Chamikara Jayalath
 - Kenneth Knowles
 - Jean-Baptiste Onofre

Non-binding votes:
 - Tyson Hamilton
 - Brian Hulette
 - Jarek Potiuk
 - Valentyn Tymofieiev
 - Kyle Weaver

Thanks everyone! I will finalize the release.

Kenn

On Sun, Apr 25, 2021 at 3:16 PM Kenneth Knowles  wrote:

> I did an additional round of making sure the human-readable quickstart
> instructions also succeed.
>
> Kenn
>
> On Thu, Apr 22, 2021 at 6:47 PM Ahmet Altay  wrote:
>
>> +1 (binding)
>>
>> I ran some python quick start examples. Most validations in the sheet
>> were already done :) Thank you all!
>>
>> On Thu, Apr 22, 2021 at 9:15 AM Kyle Weaver  wrote:
>>
>>> +1 (non-)
>>>
>>> Ran Python wordcount on Flink and Spark.
>>>
>>> On Wed, Apr 21, 2021 at 5:20 PM Brian Hulette 
>>> wrote:
>>>
 +1 (non-binding)

 I ran a python pipeline exercising the DataFrame API, and another
 exercising SQLTransform in Python, both on Dataflow.

 On Wed, Apr 21, 2021 at 12:55 PM Kenneth Knowles 
 wrote:

> Since the artifacts were changed about 26 hours ago, I intend to leave
> this vote open until 46 hours from now. Specifically, around noon my time
> (US Pacific) on Friday I will close the vote and finalize the release, if
> no problems are discovered.
>
> Kenn
>
> On Wed, Apr 21, 2021 at 12:52 PM Kenneth Knowles 
> wrote:
>
>> +1 (binding)
>>
>> I ran the script at
>> https://beam.apache.org/contribute/release-guide/#run-validations-using-run_rc_validationsh
>> except for the part that requires a GitHub PR, since Cham already did 
>> that
>> part.
>>
>> Kenn
>>
>> On Wed, Apr 21, 2021 at 12:11 PM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>>
>>> +1, verified that my previous findings are fixed.
>>>
>>> On Wed, Apr 21, 2021 at 8:17 AM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
 +1 (binding)

 Ran some Python scenarios and updated the spreadsheet.

 Thanks,
 Cham

 On Tue, Apr 20, 2021 at 3:39 PM Kenneth Knowles 
 wrote:

>
>
> On Tue, Apr 20, 2021 at 3:24 PM Robert Bradshaw <
> rober...@google.com> wrote:
>
>> The artifacts and signatures look good to me. +1 (binding)
>>
>> (The release branch still has the .dev name, maybe you didn't
>> push?
>> https://github.com/apache/beam/blob/release-2.29.0/sdks/python/apache_beam/version.py
>> )
>>
>
> Good point. I'll highlight that I finally implemented the
> branching changes from
> https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8%40%3Cdev.beam.apache.org%3E
>
> The new guide with diagram is here:
> https://beam.apache.org/contribute/release-guide/#tag-a-chosen-commit-for-the-rc
>
> TL;DR:
>  - the release branch continues to be dev/SNAPSHOT for 2.29.0
> while the main branch is now dev/SNAPSHOT for 2.30.0
>  - the RC tag v2.29.0-RC1 no longer lies on the release branch. It
> is a single tagged commit that removes the dev/SNAPSHOT suffix
>
> Kenn
>
>
>> On Tue, Apr 20, 2021 at 10:36 AM Kenneth Knowles 
>> wrote:
>>
>>> Please take another look.
>>>
>>>  - I re-ran the RC creation script so the source release and
>>> wheels are new and built from the RC tag. I confirmed the source 
>>> zip and
>>> wheels have version 2.29.0 (not .dev or -SNAPSHOT).
>>>  - I fixed and rebuilt Dataflow worker container images from
>>> exactly the RC commit, added dataclasses, with internal changes to 
>>> get the
>>> version to match.
>>>  - I confirmed that the staged jars already have version 2.29.0
>>> (not -SNAPSHOT).
>>>  - I confirmed with `diff -r -q` that the source tarball matches
>>> the RC tag (minus the .git* files and directories and gradlew)
>>>
>>> Kenn
>>>
>>> On Mon, Apr 19, 2021 at 9:19 PM Kenneth Knowles 
>>> wrote:
>>>
 At this point, the release train has just about come around to
 2.30.0 which will pick up that change. I don't think it makes 
 sense to
 cherry-pick anything more into 2.29.0 unless it is nonfunctional. 
 As it is,
 I think we have a good commit and just need to build the expected
 artifacts. Since it isn't all the artifacts, I was planning on just
 overwriting the RC1 

Beam Dependency Check Report (2021-04-26)

2021-04-26 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
chromedriver-binary
88.0.4324.96.0
91.0.4472.19.0
2021-01-25
2021-04-26BEAM-10426
dill
0.3.1.1
0.3.3
2019-10-07
2020-11-02BEAM-11167
google-cloud-bigtable
1.7.0
2.1.0
2021-04-12
2021-04-26BEAM-8127
google-cloud-datastore
1.15.3
2.1.1
2020-11-16
2021-04-26BEAM-8443
google-cloud-dlp
1.0.0
3.0.1
2020-06-29
2021-02-01BEAM-10344
google-cloud-language
1.3.0
2.0.0
2020-10-26
2020-10-26BEAM-8
google-cloud-pubsub
1.7.0
2.4.1
2020-07-20
2021-04-05BEAM-5539
google-cloud-spanner
1.19.1
3.3.0
2020-11-16
2021-03-29BEAM-10345
google-cloud-videointelligence
1.16.1
2.1.0
2020-11-23
2021-04-05BEAM-11319
google-cloud-vision
1.0.0
2.3.1
2020-03-24
2021-04-19BEAM-9581
grpcio-tools
1.30.0
1.37.0
2020-06-29
2021-04-12BEAM-9582
idna
2.10
3.1
2021-01-04
2021-01-11BEAM-9328
mock
2.0.0
4.0.3
2019-05-20
2020-12-14BEAM-7369
mypy-protobuf
1.18
2.4
2020-03-24
2021-02-08BEAM-10346
nbconvert
5.6.1
6.0.7
2020-10-05
2020-10-05BEAM-11007
Pillow
7.2.0
8.2.0
2020-10-19
2021-04-05BEAM-11071
PyHamcrest
1.10.1
2.0.2
2020-01-20
2020-07-08BEAM-9155
pytest
4.6.11
6.2.3
2020-07-08
2021-04-05BEAM-8606
pytest-xdist
1.34.0
2.2.1
2020-08-17
2021-02-15BEAM-10713
tenacity
5.1.5
7.0.0
2019-11-11
2021-03-08BEAM-8607
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.azure:azure-core
1.6.0
1.15.0
2020-07-02
2021-04-02BEAM-11888
com.azure:azure-identity
1.0.8
1.3.0-beta.2
2020-07-07
2021-03-11BEAM-11814
com.azure:azure-storage-common
12.8.0
12.11.0-beta.3
2020-08-13
2021-04-16BEAM-11889
com.datastax.cassandra:cassandra-driver-core
3.10.2
4.0.0
2020-08-26
2019-03-18BEAM-8674
com.esotericsoftware:kryo
4.0.2
5.1.0
2018-03-20
2021-04-08BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.33.0
0.38.0
2020-09-14
2021-03-08BEAM-6645
com.google.api:gax
1.56.0
1.63.0
2020-04-06
2021-04-07BEAM-10348
com.google.api.grpc:proto-google-cloud-bigquerystorage-v1
1.17.0
1.20.0
2021-03-30
2021-04-21BEAM-11890
com.google.api.grpc:proto-google-cloud-bigquerystorage-v1beta2
0.117.0
0.120.0
2021-03-30
2021-04-21BEAM-11891
com.google.api.grpc:proto-google-cloud-dlp-v2
1.1.4
2.3.1
2020-05-04
2021-04-12BEAM-11892
com.google.api.grpc:proto-google-cloud-video-intelligence-v1
1.2.0
1.6.1
2020-03-10
2021-04-09BEAM-11894
com.google.api.grpc:proto-google-cloud-vision-v1
1.81.3
1.102.1
2020-04-07
2021-04-12BEAM-11895
com.google.apis:google-api-services-cloudresourcemanager
v1-rev20210331-1.31.0
v3-rev20210411-1.31.0
2021-04-09
2021-04-20BEAM-8751
com.google.apis:google-api-services-dataflow
v1b3-rev20210408-1.31.0
v1beta3-rev12-1.20.0
2021-04-17
2015-04-29BEAM-8752
com.google.apis:google-api-services-healthcare
v1beta1-rev20210407-1.31.0
v1-rev20210407-1.31.0
2021-04-21
2021-04-21BEAM-10349
com.google.auto.service:auto-service
1.0-rc6
1.0
2019-07-16
2021-04-06BEAM-5541
com.google.auto.service:auto-service-annotations
1.0-rc6
1.0
2019-07-16
2021-04-06BEAM-10350
com.google.cloud:google-cloud-bigquerystorage
1.17.0
1.20.0
2021-03-30