Re: [Request for Feedback] Swift SDK Prototype

2023-08-17 Thread Byron Ellis via dev
Thanks Cham,

Definitely happy to open a draft PR so folks can comment---there's not as
much code as it looks like since most of the LOC is just generated
protobuf. As for the support, I definitely want to add external transforms
and may actually add that support before adding the ability to make
composites in the language itself. With the way the SDK is laid out adding
composites to the pipeline graph is a separate operation than defining a
composite.

On Thu, Aug 17, 2023 at 4:28 PM Chamikara Jayalath 
wrote:

> Thanks Byron. This sounds great. I wonder if there is interest in Swift
> SDK from folks currently subscribed to the +user 
>  list.
>
> On Wed, Aug 16, 2023 at 6:53 PM Byron Ellis via dev 
> wrote:
>
>> Hello everyone,
>>
>> A couple of months ago I decided that I wanted to really understand how
>> the Beam FnApi works and how it interacts with the Portable Runner. For me
>> at least that usually means I need to write some code so I can see things
>> happening in a debugger and to really prove to myself I understood what was
>> going on I decided I couldn't use an existing SDK language to do it since
>> there would be the temptation to read some code and convince myself that I
>> actually understood what was going on.
>>
>> One thing led to another and it turns out that to get a minimal FnApi
>> integration going you end up writing a fair bit of an SDK. So I decided to
>> take things to a point where I had an SDK that could execute a word count
>> example via a portable runner backend. I've now reached that point and
>> would like to submit my prototype SDK to the list for feedback.
>>
>> It's currently living in a branch on my fork here:
>>
>> https://github.com/byronellis/beam/tree/swift-sdk/sdks/swift
>>
>> At the moment it runs via the most recent XCode Beta using Swift 5.9 on
>> Intel Macs, but should also work using beta builds of 5.9 for Linux running
>> on Intel hardware. I haven't had a chance to try it on ARM hardware and
>> make sure all of the endian checks are complete. The
>> "IntegrationTests.swift" file contains a word count example that reads some
>> local files (as well as a missing file to exercise DLQ functionality) and
>> output counts through two separate group by operations to get it past the
>> "map reduce" size of pipeline. I've tested it against the Python Portable
>> Runner. Since my goal was to learn FnApi there is no Direct Runner at this
>> time.
>>
>> I've shown it to a couple of folks already and incorporated some of that
>> feedback already (for example pardo was originally called dofn when
>> defining pipelines). In general I've tried to make the API as "Swift-y" as
>> possible, hence the heavy reliance on closures and while there aren't yet
>> composite PTransforms there's the beginnings of what would be needed for a
>> SwiftUI-like declarative API for creating them.
>>
>> There are of course a ton of missing bits still to be implemented, like
>> counters, metrics, windowing, state, timers, etc.
>>
>
> This should be fine and we can get the code documented without these
> features. I think support for composites and adding an external transform
> (see, Java
> ,
> Python
> ,
> Go
> ,
> TypeScript
> )
> to add support for multi-lang will bring in a lot of features (for example,
> I/O connectors) for free.
>
>
>>
>> Any and all feedback welcome and happy to submit a PR if folks are
>> interested, though the "Swift Way" would be to have it in its own repo so
>> that it can easily be used from the Swift Package Manager.
>>
>
> +1 for creating a PR (may be as a draft initially). Also it'll be easier
> to comment on a PR :)
>
> - Cham
>
> [1]
> [2]
> [3]
>
>
>>
>> Best,
>> B
>>
>>
>>


Re: [PROPOSAL] Preparing for 2.50.0 Release

2023-08-17 Thread Robert Burke
RC1 is sufficiently ready for testing and validation. Please vote and discuss 
at that thread.

https://lists.apache.org/thread/xgx49zshms7253lfx6d6lsnvwf7tyyfp

Robert Burke
2.50.0 Release Manager

On 2023/08/17 02:30:00 Robert Burke wrote:
> Despite my best efforts, python continues to vex me.  RC1 is almost ready,
> just missing the beam site and doc updates PR, and (optionally) the
> typescript container.
> 
> So I'm calling it a night, and will build and send out a partial docs PR in
> the morning.
> Robert Burke
> 2.50.0 Release Manager
> 
> On Wed, Aug 16, 2023, 8:08 AM Robert Burke  wrote:
> 
> > Just a status update: Branch is cut and tagged
> >
> > https://github.com/apache/beam/tree/release-2.50.0
> > https://github.com/apache/beam/tree/v2.50.0-RC1
> >
> > I'm working on the remaining bits to have an RC. The github
> > build-release-artifacts action failed to
> > build and publish the Java Artifacts and stage the Docker containers.
> >
> > The former says:
> >
> > Execution failed for task ':sdks:java:io:solr:compileTestJava'.
> > GC overhead limit exceeded
> >
> > The latter is due to a partial application of the Multi-Arch build to the
> > github actions, that has already been fixed.
> >
> > The Dataflow Legacy Java worker and associated containers have been built
> > and published, and we apologize for the delay this caused. We're discussing
> > how we presently interleave Google internal processes with the release, and
> > how we can streamline things now that Dataflow is transitioning to RunnerV2
> > by default. In future releases, we may build the non-portable Dataflow Java
> > workers after the first RC is tagged and the open side is on its way.
> >
> > The hope is RC1 will be available tonight. Either way, this thread will be
> > updated with the status.
> >
> > Robert Burke
> > Beam 2.50.0 Release Manager
> >
> > On 2023/08/14 21:51:47 Robert Burke wrote:
> > > +1 to what XQ says.
> > >
> > > There will be a voting email thread once I've done the appropriate due
> > > diligence to the branch, and finish with the Dataflow artifacts.
> > >
> > > Generally speaking, the best validation is something you're using
> > already,
> > > to make sure that the new version of Beam works for your usage.
> > >
> > >
> > > On Mon, Aug 14, 2023, 2:41 PM XQ Hu via dev  wrote:
> > >
> > > > Welcome to the Beam community! Our release managers usually follow this
> > > >
> > https://beam.apache.org/contribute/release-guide/#10-vote-and-validate-release-candidate
> > > > to send the votes out and ask for any feedback regarding the release
> > > > candidate. If you could help run any validation on your side and cast
> > your
> > > > vote, it would be greatly appreciated and helpful for the community.
> > > >
> > > > On Mon, Aug 14, 2023 at 12:23 PM Hong  wrote:
> > > >
> > > >> I see, thanks for clarifying, Robert!
> > > >>
> > > >> Is there anything I can help with validation? Is there a wiki page
> > with
> > > >> the expected validations I can help with?
> > > >>
> > > >> Best
> > > >> Hong
> > > >>
> > > >> On 14 Aug 2023, at 14:34, Robert Burke  wrote:
> > > >>
> > > >> 
> > > >> The release branch was cut. Before yhe weekend, I was working on
> > getting
> > > >> the non-portable Dataflow Java worker built and available before
> > producing
> > > >> the RC1. The actual building bit doesn't take that long, but there's a
> > > >> bunch of additional validation that goes along with it.
> > > >>
> > > >> The current target date for 2.50.0 is September 13th, but ultimately
> > it's
> > > >> as soon as we have a validated and voted on RC.
> > > >>
> > > >> On Mon, Aug 14, 2023, 3:43 AM Hong Liang  wrote:
> > > >>
> > > >>> Thanks for driving this Robert!
> > > >>>
> > > >>> It seems the two PRs specified have been merged. A little new to
> > Beam,
> > > >>> do we have an expected release date for the 2.50 release?
> > > >>>
> > > >>> Best,
> > > >>> Hong
> > > >>>
> > > >>> On Thu, Aug 10, 2023 at 3:08 AM Robert Burke 
> > > >>> wrote:
> > > >>>
> > >  I'm in the process of producing the Cut branch, but due to various
> > >  delays on my part, it will not be cut today.
> > > 
> > >  There are two outstanding PRs blocking the cut,
> > >  https://github.com/apache/beam/pull/27947 and
> > >  https://github.com/apache/beam/pull/27939, but once those are in,
> > I'll
> > >  proceed. Remaining new issues will be cherry picked as required.
> > > 
> > >  Thanks
> > >  Robert Burke
> > >  Beam 2.50.0 Release Manager
> > > 
> > >  On 2023/07/26 15:49:37 Robert Burke wrote:
> > >  > Hey Beam community,
> > >  >
> > >  > The next release (2.50.0) branch cut is scheduled on August 9th,
> > 2023,
> > >  > according to
> > >  > the release calendar [1].
> > >  >
> > >  > I volunteer to perform this release. My plan is to cut the branch
> > on
> > >  that
> > >  > date, and cherrypick release-blocking fixes afterwards, if any.
> > > 

[VOTE] Release 2.50.0, release candidate #1

2023-08-17 Thread Robert Burke
Hi everyone,
Please review and vote on the release candidate #1 for the version 2.50.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


Reviewers are encouraged to test their own use cases with the release
candidate, and vote +1 if
no issues are found. Only PMC member votes will count towards the final
vote, but votes from all
community members is encouraged and helpful for finding regressions; you
can either test your own
use cases or use cases from the validation sheet [10].

Additional notes about this RC:

* There were issues in starting Dataflow clones portable containers to
Google Container Repository and Google Artifact Registry, so those images
may not yet be available at those locations, which may impact starting jobs
with the RC against Google Cloud Dataflow.
  * This may be worked around by explicitly setting the portable container
to use with the --sdkContainerImage flag for Java, or the
--environment_config flag for Python and Go.
* Due to an issue with my build environment, there were issues producing
two artifacts for this RC.
  * The Typescript SDK container has not yet been built or pushed. As an
experimental SDK this is not a release blocker. However, one will
eventually be published. In the meantime, the 2.49.0 container should be
sufficient.
  * Due to an issue with my build environment, the PyDocs are not currently
part of the Documentation PR update.  This will block the final release of
2.50.0
  * The current plan is to spend improve the Github Actions for releases to
be able to provide these artifacts, instead of performing a local fix to my
environment, to simplify further releases.


The staging area is available for your review, which includes:
* GitHub Release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint 02677FF4371A3756 (
lostl...@apache.org)  or D20316F712213422
(GitHub Action automated) [[3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.50.0-RC1" [5],
* website pull request listing the release [6], the blog post [6], and
publishing the API reference manual [7].
* Java artifacts were built with Gradle 7.5.1 and OpenJDK (Temurin)(build
1.8.0_382-b05).
* Python artifacts are deployed along with the source release to the
dist.apache.org [2] and PyPI[8].
* Go artifacts and documentation are available at pkg.go.dev [9]
* Validation sheet with a tab for 2.50.0 release to help with validation
[10].
* Docker images published to Docker Hub [11].
* PR to run tests against release branch [12].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

For guidelines on how to try the release in your projects, check out our
blog post at https://beam.apache.org/blog/validate-beam-release/.

Thanks,
Robert Burke
Apache Beam 2.50.0 Release Manager

[1] https://github.com/apache/beam/milestone/14
[2] https://dist.apache.org/repos/dist/dev/beam/2.50.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1353/
[5] https://github.com/apache/beam/tree/v2.50.0-RC1
[6] https://github.com/apache/beam/pull/28055
[7] https://github.com/apache/beam-site/pull/647
[8] https://pypi.org/project/apache-beam/2.50.0rc1/
[9]
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.50.0-RC1/go/pkg/beam
[10]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=.
..
[11] https://hub.docker.com/search?q=apache%2Fbeam=image
[12] https://github.com/apache/beam/pull/27962


Re: [Request for Feedback] Swift SDK Prototype

2023-08-17 Thread Chamikara Jayalath via dev
Thanks Byron. This sounds great. I wonder if there is interest in Swift SDK
from folks currently subscribed to the +user  list.

On Wed, Aug 16, 2023 at 6:53 PM Byron Ellis via dev 
wrote:

> Hello everyone,
>
> A couple of months ago I decided that I wanted to really understand how
> the Beam FnApi works and how it interacts with the Portable Runner. For me
> at least that usually means I need to write some code so I can see things
> happening in a debugger and to really prove to myself I understood what was
> going on I decided I couldn't use an existing SDK language to do it since
> there would be the temptation to read some code and convince myself that I
> actually understood what was going on.
>
> One thing led to another and it turns out that to get a minimal FnApi
> integration going you end up writing a fair bit of an SDK. So I decided to
> take things to a point where I had an SDK that could execute a word count
> example via a portable runner backend. I've now reached that point and
> would like to submit my prototype SDK to the list for feedback.
>
> It's currently living in a branch on my fork here:
>
> https://github.com/byronellis/beam/tree/swift-sdk/sdks/swift
>
> At the moment it runs via the most recent XCode Beta using Swift 5.9 on
> Intel Macs, but should also work using beta builds of 5.9 for Linux running
> on Intel hardware. I haven't had a chance to try it on ARM hardware and
> make sure all of the endian checks are complete. The
> "IntegrationTests.swift" file contains a word count example that reads some
> local files (as well as a missing file to exercise DLQ functionality) and
> output counts through two separate group by operations to get it past the
> "map reduce" size of pipeline. I've tested it against the Python Portable
> Runner. Since my goal was to learn FnApi there is no Direct Runner at this
> time.
>
> I've shown it to a couple of folks already and incorporated some of that
> feedback already (for example pardo was originally called dofn when
> defining pipelines). In general I've tried to make the API as "Swift-y" as
> possible, hence the heavy reliance on closures and while there aren't yet
> composite PTransforms there's the beginnings of what would be needed for a
> SwiftUI-like declarative API for creating them.
>
> There are of course a ton of missing bits still to be implemented, like
> counters, metrics, windowing, state, timers, etc.
>

This should be fine and we can get the code documented without these
features. I think support for composites and adding an external transform
(see, Java
,
Python
,
Go
,
TypeScript
)
to add support for multi-lang will bring in a lot of features (for example,
I/O connectors) for free.


>
> Any and all feedback welcome and happy to submit a PR if folks are
> interested, though the "Swift Way" would be to have it in its own repo so
> that it can easily be used from the Swift Package Manager.
>

+1 for creating a PR (may be as a draft initially). Also it'll be easier to
comment on a PR :)

- Cham

[1]
[2]
[3]


>
> Best,
> B
>
>
>


The Swift SDK now works on Linux

2023-08-17 Thread Byron Ellis via dev
Hello everyone,

I got a chance to test and fix a small issue that prevented the Swift SDK
from working on Linux boxes due to differences in the way Data is handled
in non-macOS Foundation implementations (which hopefully will stop being an
issue with with the native-Swift Foundation revamp). I also added the
generated protobuf similar to how we do with Go so people don't have to do
that on their own. You should now be able to do a checkout of the branch
and run "swift test" if you have a portable runner in the background on
port 8073 on Linux boxes (which also makes it more plausible this SDK would
work on, say, Dataflow)

This was tested with the August 11 release of the Swift 5.9 compiler.

Best,
B


[GitHub] [beam-site] lostluck opened a new pull request, #647: Publish 2.50.0 release

2023-08-17 Thread via GitHub


lostluck opened a new pull request, #647:
URL: https://github.com/apache/beam-site/pull/647

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [RFC] Bootloader Buffered Logging

2023-08-17 Thread Kerry Donny-Clark via dev
Much appreciated, reviewing the doc now.

On Wed, Aug 16, 2023 at 9:08 PM Valentyn Tymofieiev via dev <
dev@beam.apache.org> wrote:

> Thanks, Jack! left some comments, looking forward to this work!
>
> On Wed, Aug 16, 2023 at 10:31 AM Robert Burke  wrote:
>
>> I've added some comments but generally +1 on this.
>>
>> A later change might be able to build from this to ensure the various
>> STDErr and STDOut logs from the SDK harness executions are always plumbed
>> as described.
>>
>> But that would take more thought since other incidental logs from the
>> users worker binary (sic) might be misconstrued as serious when they were
>> largely benign noise previously ignored (since they were invisible).
>>
>> On Wed, Aug 16, 2023, 9:57 AM Jack McCluskey via dev 
>> wrote:
>>
>>> Hey everyone,
>>>
>>> I've written a small design doc around implementing some buffered
>>> logging for the Beam boot.go scripts that is available at
>>> https://s.apache.org/beam-buffered-logging. This should help surface
>>> errors that occur during worker set-up (like issues with dependency
>>> installation) that tend to be logged improperly at INFO.
>>>
>>> Thanks,
>>>
>>> Jack McCluskey
>>>
>>> --
>>>
>>>
>>> Jack McCluskey
>>> SWE - DataPLS PLAT/ Dataflow ML
>>> RDU
>>> jrmcclus...@google.com
>>>
>>>
>>>


Beam High Priority Issue Report (39)

2023-08-17 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 

Mailing list threading improvements

2023-08-17 Thread Christofer Dutz
TL;DR: We’re updating how auto-generated email from Github will be
threaded on your mailing lists. If you want to keep the old defaults,
details are below.

We’re pleased to let you know that we’re tweaking the way that auto-
generated email from Github will appear on your mailing lists. This
will lead to more human-readable subject lines, and the ability of most
modern mail clients to correctly thread discussions originating on
Github.

Background: Many project mailing lists receive email auto-generated by
Github. The way that the subject lines are crafted leads to messages
from the same topic not being threaded together by most mail clients.
We’re fixing that.

The way that these messages are threaded is defined by a file -
.asf.yml - in your git repositories. We’re changing the way that it
will work by default if you don’t choose settings. If you’re happy for
us to make this change, don’t do anything - the change will happen on
October the 1st 2023.

Details of the current default, as well as the proposed changes, are on
the following page, along with instructions on how to keep your current
settings, if you prefer:

https://community.apache.org/contributors/mailing-lists.html#configuring-the-subject-lines-of-the-emails-being-sent

Please copy d...@community.apache.org
on any feedback.

Chris, on behalf of the Comdev PMC