Re: [ANNOUNCE] Transform Service

2023-08-10 Thread Ahmet Altay via dev
Congratulations! This is a great usability improvement, lowering the bar
for using multi language features.

On Thu, Aug 10, 2023 at 3:48 PM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

> Hi All,
>
> We recently added a Docker Compose based service named Transform Service
> to Beam.
>
> Transform service includes a number of transforms released with Beam and
> provides a single endpoint for accessing them via the Beam's multi-language
> pipelines framework.
>
> I've updated Beam Java/Python SDKs to automatically use this service to
> expand cross-language transforms used by multi-lang pipelines
> when possible. This means that Beam pipelines can use cross-language
> transforms without installing other language runtimes if they have Docker
> (and Docker Compose which comes with Docker) available locally at job
> submission. Go SDK updates are in development.
>
> Users also have the option to manually startup a Transform Service with
> utilities provided with Beam SDKs if needed.
>
> For mode details regarding the Transform Service please see the
> documentation here
> 
> .
>
> A list of transforms currently included with the Transform Service is
> available here
> .
>
> Please see here
>  for a
> previous discussion on this and please let me know if you have any
> questions.
>
> Thanks,
> Cham
>
>


[ANNOUNCE] Transform Service

2023-08-10 Thread Chamikara Jayalath via dev
Hi All,

We recently added a Docker Compose based service named Transform Service to
Beam.

Transform service includes a number of transforms released with Beam and
provides a single endpoint for accessing them via the Beam's multi-language
pipelines framework.

I've updated Beam Java/Python SDKs to automatically use this service to
expand cross-language transforms used by multi-lang pipelines
when possible. This means that Beam pipelines can use cross-language
transforms without installing other language runtimes if they have Docker
(and Docker Compose which comes with Docker) available locally at job
submission. Go SDK updates are in development.

Users also have the option to manually startup a Transform Service with
utilities provided with Beam SDKs if needed.

For mode details regarding the Transform Service please see the
documentation here

.

A list of transforms currently included with the Transform Service is
available here
.

Please see here
 for a
previous discussion on this and please let me know if you have any
questions.

Thanks,
Cham


Re: [Discuss] Get rid of OWNERS files

2023-08-10 Thread Robert Bradshaw via dev
On Tue, Aug 8, 2023 at 9:50 AM Robert Burke  wrote:
>
> Either we keep OWNERS and have the review bot use them, or we remove them and 
> use the reviews bot config as the single source of truth.

+1. And I don't see any reason we're going to be any better at keeping
them up to date than we have in the past, so let's just remove them.

> The bot is less likely to go out of date since it's at least active in how it 
> behaves. I agree it doesn't necessarily solve the problem of things getting 
> out of date, but other than inactive folks officially, actively bowing out of 
> the project, I don't know there's anything we can do.
>
> IMO folks who aren't active but are still getting emails and review requests 
> should be incentivised to redirect requests to new owners or at least active 
> members.
>
>
> On Tue, Aug 8, 2023, 9:13 AM Alexey Romanenko  
> wrote:
>>
>> I’m generally agree with this (initially that was a good intention imho) but 
>> what could be an alternative for this? Review bot also may assign reviewers 
>> that are no longer active on the project.
>>
>> —
>> Alexey
>>
>>
>> On 8 Aug 2023, at 16:55, Danny McCormick via dev  wrote:
>>
>> Hey everyone, I'd like to propose getting rid of OWNERS files from the Beam 
>> repo. Right now, I don't think they are serving a meaningful purpose:
>>
>> - Many OWNERS files are outdated and point to people who are no longer 
>> actively involved in the project (examples: 1, 2, 3, there are many more)
>> - Many dependencies don't have owners assigned
>> - Many major directories function fine without OWNERS files
>> - We lack sufficient documentation of what OWNERS files mean 
>> (https://s.apache.org/beam-owners is not helpful and I couldn't find other 
>> resources)
>> - We now have the review bot to automatically assign reviewers based on 
>> areas of ownership. That has proven more likely to stay up to date.
>>
>> Given all of these, I don't see any obvious usefulness for OWNERS files. 
>> Please chime in if you disagree (or agree). If there are no objections I'll 
>> assume silent consensus and remove them next week.
>>
>> Thanks,
>> Danny
>>
>>


Re: KafkaIO Parameter Issue | Runtime PipelineOptions | Apche Beam

2023-08-10 Thread himanshu singhal
Hello Team,
Please update here.



Thanks & Regards
*Himanshu Singhal*
M:- +917821076244
E:- singhal.himansh...@gmail.com


On Sat, Aug 5, 2023 at 8:31 AM himanshu singhal <
singhal.himansh...@gmail.com> wrote:

> Hello Beam Team,
>
> I am using apache beam for reading from Kafka using KafkaIO in Dataflow
> Runner. Here I am facing an issue to make KafkaIO Parameters (like ->
> Config, topics) dynamic. I mean, when I am making these parameters as a
> PipelineOptions using RuntimeValueProvider My Dataflow Template is not
> getting created and getting errors that KafkaIO does not support
> RuntimeValueProvider. So in this case can you please suggest or give any
> sample code to make these parameters as PipelineOptions.
>
>
>
> Thanks & Regards
> *Himanshu Singhal*
> M:- +917821076244
> E:- singhal.himansh...@gmail.com
>


Beam High Priority Issue Report (39)

2023-08-10 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in