Re: Create a Dataset in GCP Testing Project?

2022-01-18 Thread Austin Bennett
Following up here, it seems I have lost my access to the (GCP-based)
Testing project -- had not addressed/finished this ticket for some time, as
had been working on other things for a bit after being distracted.

Can someone else please re-add me?  [ apologies if I can't figure out
access and one of my emails already has access - I checked several ].
Thanks!

On Tue, Sep 28, 2021 at 2:47 PM Austin Bennett 
wrote:

> Thanks, and Yes, some in wiki would be helpful -- I looked there first (
> still am not certain on our conventions/why, so not sure that I’m the
> person to document anything beyond, “send a message to the list if you’re a
> committer and need access :-) ”).  Perhaps that is an issue which should be
> filed (at least to add to backlog) in Jira.
>
> This kicks off some other questions relative to our infra and
> conventions/practices, starting a new thread for that.
>
>
> On Mon, Sep 27, 2021 at 11:10 AM Robert Burke  wrote:
>
>> We should probably add something to the wiki for that.
>>
>> On Mon, Sep 27, 2021, 10:42 AM Brian Hulette  wrote:
>>
>>> I don't think there's any policy in place for controlling access to the
>>> apache-beam-testing project. I think in general PMC members are owners and
>>> committers are editors, but it looks like there are a lot of exceptions to
>>> this rule. For example, I am an owner - so I was able to grant you editor
>>> access. I think you should be able to create a new dataset now.
>>>
>>> Brian
>>>
>>> On Fri, Sep 24, 2021 at 12:07 PM Austin Bennett <
>>> whatwouldausti...@gmail.com> wrote:
>>>
 Hi Devs,

 I am working on https://issues.apache.org/jira/browse/BEAM-10652 and
 specifically sorting out Integration Tests for this.  I believe that I need
 to create a dataset for this to work given errors ( a special purpose
 dataset seems cleaner than reusing an existing dataset, and in line with
 the conventions I see ).

 I would like to work through this, rather than someone just handling
 it.

 How to get access to our GCP Projects used for testing?  I might also
 have questions for how we generally like things done within ( ex: I haven't
 seen terraform repose for how we manage that infrastructure ;-) ).

 Thanks,
 Austin




Contributor permission for Beam Jira tickets

2022-01-18 Thread Victor Chen
Hi Apache Beam Dev Team,

I'm Victor from Google and I am working with Ning on the Python Interactive
Runner. My ASF Jira ID is victorhc. Could I please have permissions to
create and assign tickets for my work?

Thank you,
Victor


Re: [RFC][design/idea] CDAP plugins support in Apache Beam

2022-01-18 Thread Kenneth Knowles
Very cool. Thanks for sharing!

On Tue, Jan 18, 2022 at 11:42 AM Ilya Kozyrev 
wrote:

> TL:DR: We want to develop support for Apache CDAP batch and streaming
> plugins to enrich Apache Beam connectors to external applications. Please
> review the design[1] to help us bring CDAP plugins integrations into Apache
> Beam.
>
>
>
> Hi all,
>
>
>
> I along with a few community members thought of an idea to create an
> Apache Beam IO package for Apache CDAP.  CDAP IO package will enable
> creating integrations of Apache CDAP plugins
>  to Apache Beam to extend
> application connectors offered by Apache Beam.
>
>
>
> CDAP IO connector will support batch source and sinks via HadoopFormatIO.
> CDAP IO connector will support streaming sources (SparkReceiverIO to proxy
> Custom Spark Receivers for usage in Apache Beam). The proposed design and
> implementation details are described in the design document[1].
>
>
>
> Initially, we are thinking of creating the following integrations:
>
>- Salesforce
>- ServiceNow (batch)
>- Zendesk (batch)
>- Salesforce Marketing Cloud (batch)
>- Hubspot
>
>
>
> Please share your feedback both on the idea and the design doc.
>
>
>
> Thanks!
>
>
>
> [1]
> https://docs.google.com/document/d/1T-bhd0Qk7DBePIfgHEPagYiA1oLP4z5kYEd0S1SOGxQ/edit?usp=sharing
>
>
>
>
>


Re: Default output timestamp of processing-time timers

2022-01-18 Thread Kenneth Knowles
Yea, it makes sense. This is an issue for the global window where there
isn't automatic cleanup of state. I've had a few user cases where they
would like a good way of doing state cleanup in the global window too -
something where whenever state gets buffer there is always a finite timer
that will fire. There might be an opportunity here, if we attach the hold
to that associated timer rather than the state. It sounds similar to what
you describe where someone made a timer just to create a watermark hold
associated with some state - I assume they actually do need to process and
emit that state in some way related to the timer.

On Tue, Jan 18, 2022 at 9:35 AM Reuven Lax  wrote:

> Correct.
>
> IIRC originally we didn't want to add "buffered data timestamps"
> because it was error prone. Leaking even one record in state holds up the
> watermark and can cause the entire pipeline to grind to a halt. Associating
> with a timer guarantees that holds are always cleared eventually.
>
> On Tue, Jan 18, 2022 at 9:13 AM Kenneth Knowles  wrote:
>
>> This is an interesting case, and a legitimate counterexample to consider.
>> I'd call it a workaround :-). The semantic thing they would want/need is
>> "output timestamp" associated with buffered data (also implemented with
>> watermark hold). I do know systems that designed their state with this
>> built in.
>>
>> Kenn
>>
>> On Tue, Jan 18, 2022 at 8:57 AM Reuven Lax  wrote:
>>
>>> One note - some people definitely use timer.withOutputTimestamp as a
>>> watermark hold.
>>>
>>
>>> This is a scenario in which one outputs (from processElement) a
>>> timestamp behind the current input element timestamp but knows that it is
>>> safe because there is already an extent timer with an earlier
>>> output timestamp (state can be used for this). In this case I've seen
>>> timers set simply for the hold - the actual onTimer never outputs anything.
>>>
>>> Reuven
>>>
>>> On Tue, Jan 18, 2022 at 6:42 AM Kenneth Knowles  wrote:
>>>


 On Tue, Dec 14, 2021 at 2:38 PM Steve Niemitz 
 wrote:

> > I think this wouldn't be very robust to different situations where
> processing time and event time may not be that close to each other.
>
> if you do something like `min(endOfWindow, max(eventInputTimestamp,
> computedFiringTimestamp))` the worst case is that you set a watermark hold
> for somewhere in the future, right?  For example, if the watermark is
> lagging 3 hours, processing time = 4pm, event input = 1pm, window end =
> 5pm, the watermark hold/output time is set to 4pm + T.  This would make 
> the
> timestamps "newer" than the input, but shouldn't ever create late data,
> correct?
>
> Also, imo, the timestamps really already cross domains now, because
> the watermark (event time) is held until the (processing time) timer 
> fires.
>
> The concrete issue that brought this up was a pipeline with some
> state, and the state was "cleaned up" periodically with a processing time
> timer that fired every ~hour.  The author of the pipeline was confused why
> the watermark wasn't moving (and thus GBKs firing, etc).  The root cause
> was the watermark being held by the timer.
>
> > It would just save you .withOutputTimestamp(elementTimestamp) on
> your calls to setting the event time timer, right?
>
> Correct, the main thing I'm trying to solve is having to recalculate
> an output timestamp using the same logic that the timer itself is using to
> set its firing timestamp.
>

 It sounds like the main use case that you are dealing with is the case
 where the timer doesn't actually produce output (or set further timers that
 produce output) so it doesn't need (or want) a watermark hold. That makes
 sense.

 In fact, I do not view a "watermark hold" as a fundamental concept. The
 act of "set a timer with the intent that I am allowed to produce output
 with timestamp X" is the fundamental concept, and watermark hold is an
 implementation detail that should really never have been surfaced as an
 end-user concept, or really even as an SDK author concept. This is why in
 my proposal for adding output timestamps to timers, I called it
 "withOutputTimestamp", and this is why the design does not include any
 watermark holds - there is a self-loop on a transform where timers produce
 an input watermark distinct from the watermark on input elements, and that
 is enough. There is not now, and never has been, a need for the concept of
 a hold at the level of the Beam model.

 I wonder if we can automate this behavior by noticing that there is no
 OutputReceiver parameters to the timer callback, and also transitively. Or
 just work around it by saying ".withoutOutput" on the timer.

 Kenn


>
>
>
> On Tue, Dec 14, 2021 at 4:10 PM Kenneth Knowles 
> wrote:
>

[RFC][design/idea] CDAP plugins support in Apache Beam

2022-01-18 Thread Ilya Kozyrev
TL:DR: We want to develop support for Apache CDAP batch and streaming plugins 
to enrich Apache Beam connectors to external applications. Please review the 
design[1] to help us bring CDAP plugins integrations into Apache Beam.


Hi all,


I along with a few community members thought of an idea to create an Apache 
Beam IO package for Apache CDAP.  CDAP IO package will enable creating 
integrations of Apache CDAP plugins to 
Apache Beam to extend application connectors offered by Apache Beam.


CDAP IO connector will support batch source and sinks via HadoopFormatIO. CDAP 
IO connector will support streaming sources (SparkReceiverIO to proxy Custom 
Spark Receivers for usage in Apache Beam). The proposed design and 
implementation details are described in the design document[1].


Initially, we are thinking of creating the following integrations:

  *   Salesforce
  *   ServiceNow (batch)
  *   Zendesk (batch)
  *   Salesforce Marketing Cloud (batch)
  *   Hubspot


Please share your feedback both on the idea and the design doc.


Thanks!


[1] 
https://docs.google.com/document/d/1T-bhd0Qk7DBePIfgHEPagYiA1oLP4z5kYEd0S1SOGxQ/edit?usp=sharing




Re: Default output timestamp of processing-time timers

2022-01-18 Thread Kenneth Knowles
This is an interesting case, and a legitimate counterexample to consider.
I'd call it a workaround :-). The semantic thing they would want/need is
"output timestamp" associated with buffered data (also implemented with
watermark hold). I do know systems that designed their state with this
built in.

Kenn

On Tue, Jan 18, 2022 at 8:57 AM Reuven Lax  wrote:

> One note - some people definitely use timer.withOutputTimestamp as a
> watermark hold.
>

> This is a scenario in which one outputs (from processElement) a timestamp
> behind the current input element timestamp but knows that it is safe
> because there is already an extent timer with an earlier output timestamp
> (state can be used for this). In this case I've seen timers set simply for
> the hold - the actual onTimer never outputs anything.
>
> Reuven
>
> On Tue, Jan 18, 2022 at 6:42 AM Kenneth Knowles  wrote:
>
>>
>>
>> On Tue, Dec 14, 2021 at 2:38 PM Steve Niemitz 
>> wrote:
>>
>>> > I think this wouldn't be very robust to different situations where
>>> processing time and event time may not be that close to each other.
>>>
>>> if you do something like `min(endOfWindow, max(eventInputTimestamp,
>>> computedFiringTimestamp))` the worst case is that you set a watermark hold
>>> for somewhere in the future, right?  For example, if the watermark is
>>> lagging 3 hours, processing time = 4pm, event input = 1pm, window end =
>>> 5pm, the watermark hold/output time is set to 4pm + T.  This would make the
>>> timestamps "newer" than the input, but shouldn't ever create late data,
>>> correct?
>>>
>>> Also, imo, the timestamps really already cross domains now, because the
>>> watermark (event time) is held until the (processing time) timer fires.
>>>
>>> The concrete issue that brought this up was a pipeline with some state,
>>> and the state was "cleaned up" periodically with a processing time timer
>>> that fired every ~hour.  The author of the pipeline was confused why the
>>> watermark wasn't moving (and thus GBKs firing, etc).  The root cause was
>>> the watermark being held by the timer.
>>>
>>> > It would just save you .withOutputTimestamp(elementTimestamp) on your
>>> calls to setting the event time timer, right?
>>>
>>> Correct, the main thing I'm trying to solve is having to recalculate an
>>> output timestamp using the same logic that the timer itself is using to set
>>> its firing timestamp.
>>>
>>
>> It sounds like the main use case that you are dealing with is the case
>> where the timer doesn't actually produce output (or set further timers that
>> produce output) so it doesn't need (or want) a watermark hold. That makes
>> sense.
>>
>> In fact, I do not view a "watermark hold" as a fundamental concept. The
>> act of "set a timer with the intent that I am allowed to produce output
>> with timestamp X" is the fundamental concept, and watermark hold is an
>> implementation detail that should really never have been surfaced as an
>> end-user concept, or really even as an SDK author concept. This is why in
>> my proposal for adding output timestamps to timers, I called it
>> "withOutputTimestamp", and this is why the design does not include any
>> watermark holds - there is a self-loop on a transform where timers produce
>> an input watermark distinct from the watermark on input elements, and that
>> is enough. There is not now, and never has been, a need for the concept of
>> a hold at the level of the Beam model.
>>
>> I wonder if we can automate this behavior by noticing that there is no
>> OutputReceiver parameters to the timer callback, and also transitively. Or
>> just work around it by saying ".withoutOutput" on the timer.
>>
>> Kenn
>>
>>
>>>
>>>
>>>
>>> On Tue, Dec 14, 2021 at 4:10 PM Kenneth Knowles  wrote:
>>>


 On Tue, Dec 7, 2021 at 7:27 AM Steve Niemitz 
 wrote:

> If I have a processing time timer, is there any way to automatically
> set the output timestamp to the timer firing timestamp (similar to how
> event-time timers work).
>
> A common use case would be to do something like:
> timer.offset(X).align(Y).setRelative()
>


 but have the output timestamp be the firing timestamp.  In order to do
> this now you need to re-calculate the output timestamp (using the same
> logic as the timer does internally) and manually use withOutputTimestamp.


 I think this wouldn't be very robust to different situations where
 processing time and event time may not be that close to each other. In
 general I'm skeptical of reusing timestamps across time domains, for just
 this sort of reason. I wouldn't recommend doing this manually either.


> I'm not sure what the API would look like here, but it would also be
> nice to allow event-time timers to do the same in reverse (use the element
> input timestamp rather than the firing timestamp).  Maybe something like
> `withDefaultOutputTimestampFrom(...)` and an enum of FIRING_TIMESTAMP,

Flaky test issue report (45)

2022-01-18 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-13611: 
CrossLanguageJdbcIOTest.test_xlang_jdbc_write failing in Python PostCommits 
(created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13575: Flink 
testParDoRequiresStableInput flaky (created 2021-12-28)
https://issues.apache.org/jira/browse/BEAM-13525: Java VR (Dataflow, V2, 
Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests (created 
2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13522: Spark tests failing 
PerKeyOrderingTest (created 2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13519: Java precommit flaky 
(timing out) (created 2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13500: NPE in Flink Portable 
ValidatesRunner streaming suite (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13453: Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use 
(created 2021-12-13)
https://issues.apache.org/jira/browse/BEAM-13401: 
beam_PostCommit_Java_DataflowV2 
org.apache.beam.sdk.io.gcp.pubsublite.ReadWriteIT flaky (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13367: 
[beam_PostCommit_Python36] [ 
apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary 
(created 2021-12-01)
https://issues.apache.org/jira/browse/BEAM-13312: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
 is flaky in Java Spark ValidatesRunner suite  (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13311: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite. (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13234: Flake in 
StreamingWordCountIT.test_streaming_wordcount_it (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13025: pubsublite.ReadWriteIT 
flaky in beam_PostCommit_Java_DataflowV2   (created 2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12858: 
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler 
is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12793: 
beam_PostRelease_NightlySnapshot failed (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12673: 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey (created 2021-07-28)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits 
flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13)
https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL 
failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27)
https://issues.apache.org/jira/browse/BEAM-11837: Java build flakes: 
"Memory constraints are impeding performance" (created 2021-02-18)
https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest 
flake: network not found (py38 postcommit) 

P1 issues report (66)

2022-01-18 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-13672: Window.into() without a 
windowFn not correctly translated to portable representation (created 
2022-01-17)
https://issues.apache.org/jira/browse/BEAM-13668: Java Spanner IO Request 
Count metrics broke backwards compatibility (created 2022-01-15)
https://issues.apache.org/jira/browse/BEAM-13665: Spanner IO request 
metrics requires projectId within the config when it didn't in the past 
(created 2022-01-14)
https://issues.apache.org/jira/browse/BEAM-13616: Update protobuf-java to 
3.19.2 and other vendored dependencies that use protobuf (created 2022-01-08)
https://issues.apache.org/jira/browse/BEAM-13615: Bumping up FnApi 
environment version to 9 in Java, Python SDK (created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13611: 
CrossLanguageJdbcIOTest.test_xlang_jdbc_write failing in Python PostCommits 
(created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13606: bigtable io doesn't 
handle non-ok row mutations (created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13598: Install Java 17 on 
Jenkins VM (created 2022-01-04)
https://issues.apache.org/jira/browse/BEAM-13582: Beam website precommit 
mentions broken links, but passes. (created 2021-12-30)
https://issues.apache.org/jira/browse/BEAM-13579: Cannot run 
python_xlang_kafka_taxi_dataflow validation script on 2.35.0 (created 
2021-12-29)
https://issues.apache.org/jira/browse/BEAM-13522: Spark tests failing 
PerKeyOrderingTest (created 2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13504: Remove 
provided/compileOnly deps not intended for external use (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13503: BulkIO public 
constructor: Missing required property: throwWriteErrors (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13487: WriteToBigQuery Dynamic 
table destinations returns wrong tableId (created 2021-12-17)
https://issues.apache.org/jira/browse/BEAM-13430: Upgrade Gradle version to 
7.3 (created 2021-12-09)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13237: 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2 (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13213: OnWindowExpiration does 
not work without other state (created 2021-11-10)
https://issues.apache.org/jira/browse/BEAM-13203: Potential data loss when 
using SnsIO.writeAsync (created 2021-11-08)
https://issues.apache.org/jira/browse/BEAM-13164: Race between member 
variable being accessed due to leaking uninitialized state via 
OutboundObserverFactory (created 2021-11-01)
https://issues.apache.org/jira/browse/BEAM-13132: WriteToBigQuery submits a 
duplicate BQ load job if a 503 error code is returned from googleapi (created 
2021-10-27)
https://issues.apache.org/jira/browse/BEAM-13087: 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible 
(created 2021-10-20)
https://issues.apache.org/jira/browse/BEAM-13078: Python DirectRunner does 
not emit data at GC time (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13076: Python AfterAny, AfterAll 
do not follow spec (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files 
(created 2021-10-06)
https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with 
random prefix (created 2021-10-04)
https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in 
CombinePerKey operation (created 2021-09-26)
https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12843: (Broken Pipe induced) 
Bricked Dataflow Pipeline  (created 2021-09-06)
https://issues.apache.org/jira/browse/BEAM-12807: Java creates an incorrect 
pipeline proto when core-construction-java jar is not in the CLASSPATH (created 
2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12799: "Java IO IT Tests" - 
missing data in grafana (created 2021-08-25)
https://issues.apache.org/jira/browse/BEAM-12792: Beam worker only installs 
--extra_package once (created 2021-08-24)

Re: Default output timestamp of processing-time timers

2022-01-18 Thread Kenneth Knowles
On Tue, Dec 14, 2021 at 2:38 PM Steve Niemitz  wrote:

> > I think this wouldn't be very robust to different situations where
> processing time and event time may not be that close to each other.
>
> if you do something like `min(endOfWindow, max(eventInputTimestamp,
> computedFiringTimestamp))` the worst case is that you set a watermark hold
> for somewhere in the future, right?  For example, if the watermark is
> lagging 3 hours, processing time = 4pm, event input = 1pm, window end =
> 5pm, the watermark hold/output time is set to 4pm + T.  This would make the
> timestamps "newer" than the input, but shouldn't ever create late data,
> correct?
>
> Also, imo, the timestamps really already cross domains now, because the
> watermark (event time) is held until the (processing time) timer fires.
>
> The concrete issue that brought this up was a pipeline with some state,
> and the state was "cleaned up" periodically with a processing time timer
> that fired every ~hour.  The author of the pipeline was confused why the
> watermark wasn't moving (and thus GBKs firing, etc).  The root cause was
> the watermark being held by the timer.
>
> > It would just save you .withOutputTimestamp(elementTimestamp) on your
> calls to setting the event time timer, right?
>
> Correct, the main thing I'm trying to solve is having to recalculate an
> output timestamp using the same logic that the timer itself is using to set
> its firing timestamp.
>

It sounds like the main use case that you are dealing with is the case
where the timer doesn't actually produce output (or set further timers that
produce output) so it doesn't need (or want) a watermark hold. That makes
sense.

In fact, I do not view a "watermark hold" as a fundamental concept. The act
of "set a timer with the intent that I am allowed to produce output with
timestamp X" is the fundamental concept, and watermark hold is an
implementation detail that should really never have been surfaced as an
end-user concept, or really even as an SDK author concept. This is why in
my proposal for adding output timestamps to timers, I called it
"withOutputTimestamp", and this is why the design does not include any
watermark holds - there is a self-loop on a transform where timers produce
an input watermark distinct from the watermark on input elements, and that
is enough. There is not now, and never has been, a need for the concept of
a hold at the level of the Beam model.

I wonder if we can automate this behavior by noticing that there is no
OutputReceiver parameters to the timer callback, and also transitively. Or
just work around it by saying ".withoutOutput" on the timer.

Kenn


>
>
>
> On Tue, Dec 14, 2021 at 4:10 PM Kenneth Knowles  wrote:
>
>>
>>
>> On Tue, Dec 7, 2021 at 7:27 AM Steve Niemitz  wrote:
>>
>>> If I have a processing time timer, is there any way to automatically set
>>> the output timestamp to the timer firing timestamp (similar to how
>>> event-time timers work).
>>>
>>> A common use case would be to do something like:
>>> timer.offset(X).align(Y).setRelative()
>>>
>>
>>
>> but have the output timestamp be the firing timestamp.  In order to do
>>> this now you need to re-calculate the output timestamp (using the same
>>> logic as the timer does internally) and manually use withOutputTimestamp.
>>
>>
>> I think this wouldn't be very robust to different situations where
>> processing time and event time may not be that close to each other. In
>> general I'm skeptical of reusing timestamps across time domains, for just
>> this sort of reason. I wouldn't recommend doing this manually either.
>>
>>
>>> I'm not sure what the API would look like here, but it would also be
>>> nice to allow event-time timers to do the same in reverse (use the element
>>> input timestamp rather than the firing timestamp).  Maybe something like
>>> `withDefaultOutputTimestampFrom(...)` and an enum of FIRING_TIMESTAMP,
>>> ELEMENT_TIMESTAMP?
>>>
>>
>> It would just save you .withOutputTimestamp(elementTimestamp) on your
>> calls to setting the event time timer, right? It doesn't work in general
>> because a timer can be set from other OnTimer methods, where there is no
>> "element" per se, but just the output timestamp of the fired timer.
>>
>> Kenn
>>
>


Re: Beam Java starter project template

2022-01-18 Thread Kenneth Knowles
I want to clarify one thing: I am not certain the requirement of ASL2
applies to example code snippets. I am also not sure if it makes a material
difference to users. I _am_ sure we would need to deal with some process to
use something other than ASL2, so I'd rather not.

Kenn

On Tue, Jan 18, 2022 at 6:17 AM Kenneth Knowles  wrote:

> Agree with Luke here. "Just git clone and go" is a big part of it.
>
> But also the answer to "I simply don't know what one would put in a Python
> repo than, other than a bare setup.py that lists a dependency on
> apache_beam" is answered by David's initial email and his repo, namely:
>
>  - GitHub Actions configuration
>  - README.md
>  - example that already runs
>  - LICENSE (notably you've got it as MIT but to be part of Apache software
> it needs to be ASL2)
>
> Kenn
>
> On Fri, Jan 14, 2022 at 2:34 PM Luke Cwik  wrote:
>
>> I think for consistency it makes sense to users to be told to checkout
>> this git repo for the language of your choice and run. Some repos will have
>> more/less than others when it comes to setup necessary.
>>
>> On Fri, Jan 14, 2022 at 2:26 PM Robert Bradshaw 
>> wrote:
>>
>>> +1 for doing this for Java, as setting up a project there is quite
>>> complicated. I simply don't know what one would put in a Python repo
>>> than, other than a bare setup.py that lists a dependency on
>>> apache_beam. We don't have recommendations on file layout, etc. more
>>> than that (though there's plenty of generic advice to be found out
>>> there on the topic). I have a hunch go is similar, and javascript
>>> would be as well (npm install apache-beam and your package.json file
>>> gets updated).
>>>
>>> On Fri, Jan 14, 2022 at 2:17 PM Luke Cwik  wrote:
>>> >
>>> > There are several examples already within the Beam repo found in:
>>> > https://github.com/apache/beam/tree/master/examples
>>> > https://github.com/apache/beam/tree/master/sdks/go/examples
>>> >
>>> https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples
>>> >
>>> >
>>> > On Fri, Jan 14, 2022 at 11:07 AM Sachin Agarwal 
>>> wrote:
>>> >>
>>> >> I'd love to do something other than Wordcount just for
>>> novelty/freshness but agreed with the suggestion that having an example in
>>> each quickstart would be ideal.
>>> >>
>>> >> On Fri, Jan 14, 2022 at 11:06 AM David Huntsperger <
>>> dhuntsper...@google.com> wrote:
>>> >>>
>>> >>> + 1 to a separate repo for each language.
>>> >>>
>>> >>> Would it make sense to include the Wordcount example in each repo? I
>>> know that makes the repos less minimal, but we could rewrite the
>>> quickstarts around these repos instead of the current Wordcount examples.
>>> Or maybe we don't need to use the Wordcount example in the quickstarts...
>>> >>>
>>> >>> On Wed, Jan 12, 2022 at 1:54 PM David Cavazos 
>>> wrote:
>>> 
>>>  I agree with dropping the archetypes. Less maintenance is
>>> preferable, and the github repos are more flexible and maintainable.
>>> 
>>>  How about we create:
>>> 
>>>  apache/beam-starter-java
>>>  apache/beam-starter-python
>>>  apache/beam-starter-go
>>> 
>>>  During our OKR planning, +Keith Malvetti would prefer having repos
>>> for all languages. It makes sense for consistency as well.
>>> 
>>>  On Mon, Jan 10, 2022 at 5:14 PM Luke Cwik  wrote:
>>> >
>>> > As long as we have tags so that people can pull out a specific
>>> version of the examples that coincides with a specific SDK version then we
>>> could drop the archetypes.
>>> >
>>> > On Mon, Jan 10, 2022 at 4:09 PM Brian Hulette 
>>> wrote:
>>> >>
>>> >> > Being such minimal examples, I don't expect them to break
>>> commonly, but I think it would be good to make sure tests aren't failing
>>> when a release is published.
>>> >>
>>> >> Yeah it would be very unfortunate if we discovered a breakage
>>> after the release. Agree we should verify RCs (document as part of the
>>> release process), or even better, add automation to verify the repo against
>>> snapshots. The automation could be nice to have anyway since it provides an
>>> example for users to follow if they want to test against snapshots and
>>> report issues to us sooner.
>>> >>
>>> >>
>>> >> If we move forward with this can we drop the archetype?
>>> >>
>>> >> On Fri, Jan 7, 2022 at 3:54 PM Luke Cwik 
>>> wrote:
>>> >>>
>>> >>> Sounds reasonable.
>>> >>>
>>> >>> On Wed, Jan 5, 2022 at 12:47 PM David Cavazos <
>>> dcava...@google.com> wrote:
>>> 
>>>  I personally like the idea of a separate repo since we can see
>>> how a true minimal project looks like. Having it in the main repo would
>>> inherit build file configurations and other settings that would be
>>> different from a clean project, so it could be non-trivial to adapt. Also
>>> as its own repo, it's easier to clone and modify, or create an instance of
>>> the template.
>>> 
>>>  Dependabot can take care of 

Re: add developer

2022-01-18 Thread Kenneth Knowles
Hi Andrei,

I've added you to the "Contributors" role on Jira, so you can be assigned
tickets. Is this what you mean?

Kenn

On Tue, Jan 18, 2022 at 6:15 AM Andrei Kustov 
wrote:

> Hi community, sorry if I confuse somebody in my previous mail.
> Could someone please add me to the Apache Jira as a developer?
> This is my Jira ID: andreykus
> --
> *От:* Andrei Kustov
> *Отправлено:* 17 января 2022 г. 10:15:32
> *Кому:* dev@beam.apache.org
> *Тема:* add developer
>
>
> Good day.
>
> I want to participate in the development of Apache Beam.
>
> Add me as a developer
>
>
> Best regards,
> Kustov Andrey (andrei.kus...@akvelon.com)
> Akvelon Inc.
>
>
>


Re: Beam Java starter project template

2022-01-18 Thread Kenneth Knowles
Agree with Luke here. "Just git clone and go" is a big part of it.

But also the answer to "I simply don't know what one would put in a Python
repo than, other than a bare setup.py that lists a dependency on
apache_beam" is answered by David's initial email and his repo, namely:

 - GitHub Actions configuration
 - README.md
 - example that already runs
 - LICENSE (notably you've got it as MIT but to be part of Apache software
it needs to be ASL2)

Kenn

On Fri, Jan 14, 2022 at 2:34 PM Luke Cwik  wrote:

> I think for consistency it makes sense to users to be told to checkout
> this git repo for the language of your choice and run. Some repos will have
> more/less than others when it comes to setup necessary.
>
> On Fri, Jan 14, 2022 at 2:26 PM Robert Bradshaw 
> wrote:
>
>> +1 for doing this for Java, as setting up a project there is quite
>> complicated. I simply don't know what one would put in a Python repo
>> than, other than a bare setup.py that lists a dependency on
>> apache_beam. We don't have recommendations on file layout, etc. more
>> than that (though there's plenty of generic advice to be found out
>> there on the topic). I have a hunch go is similar, and javascript
>> would be as well (npm install apache-beam and your package.json file
>> gets updated).
>>
>> On Fri, Jan 14, 2022 at 2:17 PM Luke Cwik  wrote:
>> >
>> > There are several examples already within the Beam repo found in:
>> > https://github.com/apache/beam/tree/master/examples
>> > https://github.com/apache/beam/tree/master/sdks/go/examples
>> >
>> https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples
>> >
>> >
>> > On Fri, Jan 14, 2022 at 11:07 AM Sachin Agarwal 
>> wrote:
>> >>
>> >> I'd love to do something other than Wordcount just for
>> novelty/freshness but agreed with the suggestion that having an example in
>> each quickstart would be ideal.
>> >>
>> >> On Fri, Jan 14, 2022 at 11:06 AM David Huntsperger <
>> dhuntsper...@google.com> wrote:
>> >>>
>> >>> + 1 to a separate repo for each language.
>> >>>
>> >>> Would it make sense to include the Wordcount example in each repo? I
>> know that makes the repos less minimal, but we could rewrite the
>> quickstarts around these repos instead of the current Wordcount examples.
>> Or maybe we don't need to use the Wordcount example in the quickstarts...
>> >>>
>> >>> On Wed, Jan 12, 2022 at 1:54 PM David Cavazos 
>> wrote:
>> 
>>  I agree with dropping the archetypes. Less maintenance is
>> preferable, and the github repos are more flexible and maintainable.
>> 
>>  How about we create:
>> 
>>  apache/beam-starter-java
>>  apache/beam-starter-python
>>  apache/beam-starter-go
>> 
>>  During our OKR planning, +Keith Malvetti would prefer having repos
>> for all languages. It makes sense for consistency as well.
>> 
>>  On Mon, Jan 10, 2022 at 5:14 PM Luke Cwik  wrote:
>> >
>> > As long as we have tags so that people can pull out a specific
>> version of the examples that coincides with a specific SDK version then we
>> could drop the archetypes.
>> >
>> > On Mon, Jan 10, 2022 at 4:09 PM Brian Hulette 
>> wrote:
>> >>
>> >> > Being such minimal examples, I don't expect them to break
>> commonly, but I think it would be good to make sure tests aren't failing
>> when a release is published.
>> >>
>> >> Yeah it would be very unfortunate if we discovered a breakage
>> after the release. Agree we should verify RCs (document as part of the
>> release process), or even better, add automation to verify the repo against
>> snapshots. The automation could be nice to have anyway since it provides an
>> example for users to follow if they want to test against snapshots and
>> report issues to us sooner.
>> >>
>> >>
>> >> If we move forward with this can we drop the archetype?
>> >>
>> >> On Fri, Jan 7, 2022 at 3:54 PM Luke Cwik  wrote:
>> >>>
>> >>> Sounds reasonable.
>> >>>
>> >>> On Wed, Jan 5, 2022 at 12:47 PM David Cavazos <
>> dcava...@google.com> wrote:
>> 
>>  I personally like the idea of a separate repo since we can see
>> how a true minimal project looks like. Having it in the main repo would
>> inherit build file configurations and other settings that would be
>> different from a clean project, so it could be non-trivial to adapt. Also
>> as its own repo, it's easier to clone and modify, or create an instance of
>> the template.
>> 
>>  Dependabot can take care of updating the Beam version and other
>> dependencies automatically. Testing is already set up via GitHub actions
>> for every pull request, so it would automatically be tested as soon as
>> there is a new dependency version available.
>> 
>>  Being such minimal examples, I don't expect them to break
>> commonly, but I think it would be good to make sure tests aren't failing
>> when a release is published.
>> 
>>  I'm okay with 

Re: add developer

2022-01-18 Thread Andrei Kustov
Hi community, sorry if I confuse somebody in my previous mail.
Could someone please add me to the Apache Jira as a developer?
This is my Jira ID: andreykus


От: Andrei Kustov
Отправлено: 17 января 2022 г. 10:15:32
Кому: dev@beam.apache.org
Тема: add developer


Good day.

I want to participate in the development of Apache Beam.

Add me as a developer


Best regards,

Kustov Andrey (andrei.kus...@akvelon.com)
Akvelon Inc.



Re: [DISCUSS] propdeps removal and what to do going forward

2022-01-18 Thread Kenneth Knowles
On Fri, Jan 14, 2022 at 9:34 AM Daniel Collins  wrote:

> > In particular the Hadoop/Spark and Kafka dependencies must be
> **provided** as they were. I am not sure of others but those three matter.
>
> I think there's a bit of a difference here between what should be the
> state in the short term versus the long term.
>
> In the short term, I agree that we should avoid changes to how these
> dependencies are reflected in the POM.
>
> In the long term, I don't think it makes sense for these to continue to be
> "provided" dependencies- if users wish to use a different version of
> hadoop, spark or kafka, they can explicitly override the dependencies with
> the version they want when building their JAR, even if there is a version
> listed as "compile" in the POM file on maven central. The only difference
> is that if they don't have a version preference, the one listed in the POM
> (that we tested with) will be used, which seems like an unambiguous win to
> me.
>

Agree with the sentiment. But I believe the issue is that some tooling will
bundle up the "compile" dependencies and submit with the job, which will
then have a conflict with the libraries on the cluster. On the other hand,
the user will always want to override the "provided" version to match the
cluster, in which case it will just be harmless duplicates on the
classpath, no? I guess huge file size, but it isn't the 90s any more. Since
Ismaël commented, maybe he can help to clarify. I also knew about this
reasoning for Spark & Hadoop but I don't know exactly what is required to
make it work right.

This could become a bothersome issue long term - Gradle dev community has
lots of posts that indicate they don't agree with the existence of
"provided" or "optional" dependencies. (I happen to agree with them, but
philosophy is not the point). We should have a very clear solution for the
cases that require one, and document at least on the wiki.

Kenn


>
> -Daniel
>
> On Thu, Jan 13, 2022 at 4:19 PM Ismaël Mejía  wrote:
>
>> Optional dependencies should not be a major issue.
>>
>> What matters to validate that we are not breaking users is to compare
>> the generated POM files with the previous (pre gradle 7 / 2.35.0)
>> version and see that what was provided is still provided.
>>
>> In particular the Hadoop/Spark and Kafka dependencies must be
>> **provided** as they were. I am not sure of others but those three
>> matter.
>>
>> Ismaël
>>
>> On Wed, Jan 12, 2022 at 10:55 PM Emily Ye  wrote:
>> >
>> > We've chatted offline and have a tentative plan for what to do with
>> these dependencies that are currently marked as compileOnly (instead of
>> provided). Please review the list if possible [1].
>> >
>> > Two projects we aren't sure about:
>> >
>> > :sdks:java:io:hcatalog
>> >
>> > library.java.jackson_annotations
>> > library.java.jackson_core
>> > library.java.jackson_databind
>> > library.java.hadoop_common
>> > org.apache.hive:hive-exec
>> > org.apache.hive.hcatalog:hive-hcatalog-core
>> >
>> > :sdks:java:io:parquet
>> >
>> > library.java.hadoop_client
>> >
>> >
>> > Does anyone have experience with either of these IOs? ccing Chamikara
>> >
>> > Thank you,
>> > Emily
>> >
>> >
>> > [1]
>> https://docs.google.com/spreadsheets/d/1UpeQtx1PoAgeSmpKxZC9lv3B9G1c7cryW3iICfRtG1o/edit?usp=sharing
>> >
>> > On Tue, Jan 11, 2022 at 6:38 PM Emily Ye  wrote:
>> >>
>> >> As the person volunteering to do fixes for this to unblock Beam
>> 2.36.0, I created a spreadsheet of the projects with dependencies changed
>> from provided to compile only [1]. I pre-filled with what I think things
>> should be, but I don't have very much background in java/maven/gradle
>> configurations so please give input!
>> >>
>> >> Some (mainly hadoop/kafka) I left blank, since I'm not sure - do we
>> keep them provided because it depends on the user's version?
>> >>
>> >> [1]
>> https://docs.google.com/spreadsheets/d/1UpeQtx1PoAgeSmpKxZC9lv3B9G1c7cryW3iICfRtG1o/edit?usp=sharing
>> >>
>> >> On Tue, Jan 11, 2022 at 1:17 PM Luke Cwik  wrote:
>> >>>
>> >>> I'm not following what you're trying to say Kenn since provided in
>> maven requires the user to explicitly add the dependency themselves to have
>> it part of their runtime.
>> >>>
>> >>> As per
>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#dependency-scope
>> >>> "
>> >>> * provided
>> >>> This is much like compile, but indicates you expect the JDK or a
>> container to provide the dependency at runtime. For example, when building
>> a web application for the Java Enterprise Edition, you would set the
>> dependency on the Servlet API and related Java EE APIs to scope provided
>> because the web container provides those classes. A dependency with this
>> scope is added to the classpath used for compilation and test, but not the
>> runtime classpath. It is not transitive."
>> >>>
>> >>> On Tue, Jan 11, 2022 at 11:54 AM Kenneth Knowles 
>> wrote:
>> 
>>  To clarify: "provided" should have 

Re: [DISCUSS] Migrate Jira to GitHub Issues?

2022-01-18 Thread Kenneth Knowles
I also think that we are at the point where a document describing them
side-by-side is needed. I would very much like to help. I strongly support
moving to GitHub Issues.

I'm less concerned about pros/cons (I think the one big pro of "everyone
knows it and already has an account" outweighs almost any con) but I want
to build a very clear plan of how we will map Jira features to GitHub
features. I use quite a lot of Jira's features. In particular, a lot of
things seem like they'll become conventions around labels, which I expect
to often be low enough data quality that we would just not bother, unless
we can control it a bit.

I eagerly await the link! Feel free to share very early :-)

Kenn

On Thu, Jan 13, 2022 at 1:48 PM Aizhamal Nurmamat kyzy 
wrote:

> I think I am enthusiastic enough to help with the doc :) will share the
> link soon.
>
> On Thu, Jan 13, 2022 at 10:12 AM Robert Bradshaw 
> wrote:
>
>> I don't know if we have consensus, but it seems that some people are
>> quite supportive (myself included), and some are ambivalent. The only
>> major con I can see is that github doesn't support tagging an issue to
>> multiple milestones (but it's unclear how important that is).
>>
>> I would suggest that someone enthusiastic about this proposal put
>> together a doc where we can enumerate the pros and cons and once the
>> list seems complete we can bring it back to the list for further
>> discussion and/or a vote (if needed, likely not).
>>
>> On Thu, Jan 13, 2022 at 9:27 AM Alexey Romanenko
>>  wrote:
>> >
>> > I’m not sure that we have a consensus on this. Since this thread
>> initially was started to discuss and gather some feedback then I think it
>> would be great to have a summary with pros and cons of this migration.
>> >
>> > —
>> > Alexey
>> >
>> > On 13 Jan 2022, at 00:11, Aizhamal Nurmamat kyzy 
>> wrote:
>> >
>> > Hi all,
>> >
>> > Is there a consensus to migrate to GitHub?
>> >
>> > On Wed, Dec 15, 2021 at 9:17 AM Brian Hulette 
>> wrote:
>> >>
>> >>
>> >>
>> >> On Tue, Dec 14, 2021 at 1:14 PM Kenneth Knowles 
>> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Dec 9, 2021 at 11:50 PM Jean-Baptiste Onofre 
>> wrote:
>> 
>>  Hi,
>> 
>>  No problem for me. The only thing I don’t like with GitHub issues is
>> that fact that it’s not possible to “assign” several milestones to an issue.
>>  When we maintain several active branch/version, it sucks (one issue
>> == one milestone), as we have to create several issue.
>> >>>
>> >>>
>> >>> This is a good point to consider. In Beam we often create multiple
>> issues anyhow when we intend to backport/cherrypick a fix. One issue for
>> the original fix and one each targeted cherrypick. This way their
>> resolution status can be tracked separately. But it is nice for users to be
>> able to go back and edit the original bug report to say which versions are
>> affected and which are not.
>> >>
>> >>
>> >> I looked into this a little bit. It looks like milestones don't have
>> to represent a release (e.g. they could represent some abstract goal), but
>> they are often associated with releases. This seems like a reasonable field
>> to map to "Fix Version/s" in jira, but jira does support specifying
>> multiple releases. So one issue == one milestone would be a regression.
>> >> As Kenn pointed out though we often create a separate jira to track
>> backports anyway (even though we could just specify multiple fix versions),
>> so I'm not sure this is a significant blocker.
>> >>
>> >> If we want to use milestones to track abstract goals, I think we'd be
>> out of luck. We could just use labels, but the GitHub UI doesn't present a
>> nice burndown chart for those. See
>> https://github.com/pandas-dev/pandas/milestones vs.
>> https://github.com/pandas-dev/pandas/labels. FWIW jira doesn't have
>> great functionality here either.
>> >>
>> >>>
>> >>>
>> >>> Kenn
>> >>>
>> 
>> 
>>  Regards
>>  JB
>> 
>>  > Le 10 déc. 2021 à 01:28, Kyle Weaver  a
>> écrit :
>>  >
>>  > I’m in favor of switching to Github issues. I can’t think of a
>> single thing jira does better.
>>  >
>>  > Thanks Jarek, this is a really great resource [1]. For another
>> reference, the Calcite project is engaged in the same discussion right now
>> [2]. I came up with many of the same points independently before I saw
>> their thread.
>>  >
>>  > When evaluating feature parity, we should make a distinction
>> between non-structured (text) and structured data. And we don’t need a
>> strict mechanical mapping for everything unless we’re planning on
>> automatically migrating all existing issues. I don’t see the point in
>> automatic migration, though; as Jarek pointed out, we’d end up perpetuating
>> a ton of obsolete issues.
>>  >
>>  >   • We use nested issues and issue relations in jira, but as
>> far as I know robots don’t use them and we don’t query them much, so we’re
>> not losing anything by moving from an API to 

add developer

2022-01-18 Thread Andrei Kustov
Good day.

I want to participate in the development of Apache Beam.

Add me as a developer


Best regards,

Kustov Andrey (andrei.kus...@akvelon.com)
Akvelon Inc.