Re: What to do about issues that track flaky tests?

2022-09-14 Thread Manu Zhang
Agreed. I also mentioned in a previous email that some issues have been
open for a long time (before being migrated to GitHub) and it's possible
that those tests can pass constantly now.
We may double check and close them since reopening is just one click.

Manu

On Thu, Sep 15, 2022 at 6:58 AM Austin Bennett 
wrote:

> +1 to being realistic -- proper labels are worthwhile.  Though, some flaky
> tests probably should be P1, and just because isn't addressed in a timely
> manner doesn't mean it isn't a P1 - though, it does mean it wasn't
> addressed.
>
>
>
> On Wed, Sep 14, 2022 at 1:19 PM Kenneth Knowles  wrote:
>
>> I would like to make this alert email actionable.
>>
>> I went through most of these issues. About half are P1 "flake" issues. I
>> don't think magically expecting them to be deflaked is helpful. So I have a
>> couple ideas:
>>
>> 1. Exclude "flake" P1s from this email. This is what we used to do. But
>> then... are they really P1s?
>> 2. Make "flake" bugs P2 if they are not currently impacting our test
>> signal. But then... we may have a gap in test coverage that could cause
>> severe problems. But anyhow something that is P1 for a long time is not
>> *really* P1, so it is just being realistic.
>>
>> What do you all think?
>>
>> Kenn
>>
>> On Wed, Sep 14, 2022 at 3:03 AM  wrote:
>>
>>> This is your daily summary of Beam's current high priority issues that
>>> may need attention.
>>>
>>> See https://beam.apache.org/contribute/issue-priorities for the
>>> meaning and expectations around issue priorities.
>>>
>>> Unassigned P1 Issues:
>>>
>>> https://github.com/apache/beam/issues/23227 [Bug]: Python SDK
>>> installation cannot generate proto with protobuf 3.20.2
>>> https://github.com/apache/beam/issues/23179 [Bug]: Parquet size
>>> exploded for no apparent reason
>>> https://github.com/apache/beam/issues/22913 [Bug]:
>>> beam_PostCommit_Java_ValidatesRunner_Flink is flakey
>>> https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka
>>> SDF and fix known and discovered issues
>>> https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze
>>> at getConnection() in WriteFn
>>> https://github.com/apache/beam/issues/21794 Dataflow runner creates a
>>> new timer whenever the output timestamp is change
>>> https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't
>>> get output to Failed Inserts PCollection
>>> https://github.com/apache/beam/issues/21704
>>> beam_PostCommit_Java_DataflowV2 failures parent bug
>>> https://github.com/apache/beam/issues/21701
>>> beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors
>>> https://github.com/apache/beam/issues/21700
>>> --dataflowServiceOptions=use_runner_v2 is broken
>>> https://github.com/apache/beam/issues/21696 Flink Tests failure :
>>> java.lang.NoClassDefFoundError: Could not initialize class
>>> org.apache.beam.runners.core.construction.SerializablePipelineOptions
>>> https://github.com/apache/beam/issues/21695 DataflowPipelineResult does
>>> not raise exception for unsuccessful states.
>>> https://github.com/apache/beam/issues/21694 BigQuery Storage API insert
>>> with writeResult retry and write to error table
>>> https://github.com/apache/beam/issues/21480 flake:
>>> FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
>>> https://github.com/apache/beam/issues/21472 Dataflow streaming tests
>>> failing new AfterSynchronizedProcessingTime test
>>> https://github.com/apache/beam/issues/21471 Flakes: Failed to load
>>> cache entry
>>> https://github.com/apache/beam/issues/21470 Test flake:
>>> test_split_half_sdf
>>> https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink
>>> flaky: Connection refused
>>> https://github.com/apache/beam/issues/21468
>>> beam_PostCommit_Python_Examples_Dataflow failing
>>> https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming
>>> Java load tests failing
>>> https://github.com/apache/beam/issues/21465 Kafka commit offset drop
>>> data on failure for runners that have non-checkpointing shuffle
>>> https://github.com/apache/beam/issues/21463 NPE in Flink Portable
>>> ValidatesRunner streaming suite
>>> https://github.com/apache/beam/issues/21462 Flake in
>>> org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in
>>> use
>>> https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT
>>> flaky in beam_PostCommit_Java_DataflowV2
>>> https://github.com/apache/beam/issues/21270
>>> org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
>>> flaky on Dataflow Runner V2
>>> https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a
>>> duplicate BQ load job if a 503 error code is returned from googleapi
>>> https://github.com/apache/beam/issues/21266
>>> org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
>>> is flaky in Java ValidatesRunner Flink suite.
>>> https://github.com/apache/beam/issues/21262 

Re: [PROPOSAL] Preparing for 2.42.0 Release

2022-09-14 Thread Robert Burke
I'm feeling much better.

Cherry picks have quieted down, so I'll be working on an RC0 tomorrow.

Your friendly neighborhood release manager
Robert Burke


On Thu, Sep 8, 2022, 3:01 PM Robert Burke  wrote:

> My recent travels appear to prevent me from working at my best and focus
> on getting the release out this week.
>
> I am going to take a day/weekend to rest so I can work better next week.
> Fortunately, I've been rather thoroughly been testing Covid negative since
> my trip.
>
> While I'll be resting, I will still be keeping an eye on emails and Cherry
> Pick requests for 2.42.0 that come across my Github dashboard (@lostluck).
> If you have fixes for any regressions, please do incorporate them into the
> main branch, and send the cherry pick PR to the release-2.42.0 branch for
> my review.
>
> Thank you for your patience with this delay and release processes will
> resume in earnest on Monday the 12th PT.
>
> Your friendly neighbourhood release manager,
> Robert Burke
>
>
> On Wed, Sep 7, 2022 at 6:21 PM Robert Burke  wrote:
>
>> A release branch has been cut!
>>
>> https://github.com/apache/beam/tree/release-2.42.0
>>
>> However, I don't have time to start further verification procedures. The
>> plan is to get them going tomorrow morning PT, and proceed to a quick RC so
>> that community verification can begin promptly. It's not anticipated that
>> that will be the final RC.
>>
>> Either an RC will be published tomorrow, with it's accompanying RC
>> thread, or this thread will be updated by EoD.
>>
>> Thank you for your patience,
>> Your friendly neighbourhood release manager
>> Robert Burke
>>
>> On Wed, Sep 7, 2022 at 10:55 AM Robert Burke  wrote:
>>
>>> 'One issue remains open in the 2.42.0 milestone [1] blocking the cut.
>>>
>>> Since it's a BOM update I'm inclined to wait for it, rather than cause a
>>> cherry pick to invalidate all prior testing.
>>>
>>> The PR [2] is making good progress in ensuring no linkage issues, so I
>>> still anticipate that the 2.42.0 branch cut will happen by EoD today
>>> Pacific Time.
>>>
>>> If that doesn't occur, I'll send another email to this thread with an
>>> update.
>>>
>>> Thank you for your patience,
>>> Your friendly neighbourhood release manager
>>> Robert Burke
>>>
>>> [1] https://github.com/apache/beam/milestone/4
>>> [2] https://github.com/apache/beam/pull/22996
>>>
>>> On Wed, Aug 24, 2022 at 3:05 PM Robert Burke 
>>> wrote:
>>>
 Hi everyone!

 The next (2.42.0) release branch cut is scheduled for Sept 7th,
 according to
 the release calendar [1].

 I would like to volunteer myself to do this release. My plan is to cut
 the branch on that date, and cherrypick release-blocking fixes afterwards,
 if any.

 Please help me make sure the release goes smoothly by:
 - Making sure that any unresolved release blocking issues for 2.42.0 should
 have their "Milestone" marked as "2.42.0 Release" as soon as possible.
 - Reviewing the current release blockers [2] and remove the Milestone
 if they don't meet the criteria at [3].

 Let me know if you have any comments/objections/questions.

 Due to travel, I will be unavailable to answer any emails from August
 26th until Sept 5th, but cut blocking concerns will be resolved before I
 make the cut, and will delay it if necessary. This thread will be notified
 in that instance.

 Thanks,
 Robert Burke

 [1]
 https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
 [2] https://github.com/apache/beam/milestone/4
 [3] https://beam.apache.org/contribute/release-blocking/

>>>


Re: What to do about issues that track flaky tests?

2022-09-14 Thread Austin Bennett
+1 to being realistic -- proper labels are worthwhile.  Though, some flaky
tests probably should be P1, and just because isn't addressed in a timely
manner doesn't mean it isn't a P1 - though, it does mean it wasn't
addressed.



On Wed, Sep 14, 2022 at 1:19 PM Kenneth Knowles  wrote:

> I would like to make this alert email actionable.
>
> I went through most of these issues. About half are P1 "flake" issues. I
> don't think magically expecting them to be deflaked is helpful. So I have a
> couple ideas:
>
> 1. Exclude "flake" P1s from this email. This is what we used to do. But
> then... are they really P1s?
> 2. Make "flake" bugs P2 if they are not currently impacting our test
> signal. But then... we may have a gap in test coverage that could cause
> severe problems. But anyhow something that is P1 for a long time is not
> *really* P1, so it is just being realistic.
>
> What do you all think?
>
> Kenn
>
> On Wed, Sep 14, 2022 at 3:03 AM  wrote:
>
>> This is your daily summary of Beam's current high priority issues that
>> may need attention.
>>
>> See https://beam.apache.org/contribute/issue-priorities for the
>> meaning and expectations around issue priorities.
>>
>> Unassigned P1 Issues:
>>
>> https://github.com/apache/beam/issues/23227 [Bug]: Python SDK
>> installation cannot generate proto with protobuf 3.20.2
>> https://github.com/apache/beam/issues/23179 [Bug]: Parquet size exploded
>> for no apparent reason
>> https://github.com/apache/beam/issues/22913 [Bug]:
>> beam_PostCommit_Java_ValidatesRunner_Flink is flakey
>> https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka
>> SDF and fix known and discovered issues
>> https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze
>> at getConnection() in WriteFn
>> https://github.com/apache/beam/issues/21794 Dataflow runner creates a
>> new timer whenever the output timestamp is change
>> https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get
>> output to Failed Inserts PCollection
>> https://github.com/apache/beam/issues/21704
>> beam_PostCommit_Java_DataflowV2 failures parent bug
>> https://github.com/apache/beam/issues/21701
>> beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors
>> https://github.com/apache/beam/issues/21700
>> --dataflowServiceOptions=use_runner_v2 is broken
>> https://github.com/apache/beam/issues/21696 Flink Tests failure :
>> java.lang.NoClassDefFoundError: Could not initialize class
>> org.apache.beam.runners.core.construction.SerializablePipelineOptions
>> https://github.com/apache/beam/issues/21695 DataflowPipelineResult does
>> not raise exception for unsuccessful states.
>> https://github.com/apache/beam/issues/21694 BigQuery Storage API insert
>> with writeResult retry and write to error table
>> https://github.com/apache/beam/issues/21480 flake:
>> FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
>> https://github.com/apache/beam/issues/21472 Dataflow streaming tests
>> failing new AfterSynchronizedProcessingTime test
>> https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache
>> entry
>> https://github.com/apache/beam/issues/21470 Test flake:
>> test_split_half_sdf
>> https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink
>> flaky: Connection refused
>> https://github.com/apache/beam/issues/21468
>> beam_PostCommit_Python_Examples_Dataflow failing
>> https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java
>> load tests failing
>> https://github.com/apache/beam/issues/21465 Kafka commit offset drop
>> data on failure for runners that have non-checkpointing shuffle
>> https://github.com/apache/beam/issues/21463 NPE in Flink Portable
>> ValidatesRunner streaming suite
>> https://github.com/apache/beam/issues/21462 Flake in
>> org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in
>> use
>> https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky
>> in beam_PostCommit_Java_DataflowV2
>> https://github.com/apache/beam/issues/21270
>> org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
>> flaky on Dataflow Runner V2
>> https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a
>> duplicate BQ load job if a 503 error code is returned from googleapi
>> https://github.com/apache/beam/issues/21266
>> org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
>> is flaky in Java ValidatesRunner Flink suite.
>> https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do
>> not follow spec
>> https://github.com/apache/beam/issues/21261
>> org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
>> is flaky
>> https://github.com/apache/beam/issues/21260 Python DirectRunner does not
>> emit data at GC time
>> https://github.com/apache/beam/issues/21257 Either Create or
>> DirectRunner 

What to do about issues that track flaky tests?

2022-09-14 Thread Kenneth Knowles
I would like to make this alert email actionable.

I went through most of these issues. About half are P1 "flake" issues. I
don't think magically expecting them to be deflaked is helpful. So I have a
couple ideas:

1. Exclude "flake" P1s from this email. This is what we used to do. But
then... are they really P1s?
2. Make "flake" bugs P2 if they are not currently impacting our test
signal. But then... we may have a gap in test coverage that could cause
severe problems. But anyhow something that is P1 for a long time is not
*really* P1, so it is just being realistic.

What do you all think?

Kenn

On Wed, Sep 14, 2022 at 3:03 AM  wrote:

> This is your daily summary of Beam's current high priority issues that may
> need attention.
>
> See https://beam.apache.org/contribute/issue-priorities for the
> meaning and expectations around issue priorities.
>
> Unassigned P1 Issues:
>
> https://github.com/apache/beam/issues/23227 [Bug]: Python SDK
> installation cannot generate proto with protobuf 3.20.2
> https://github.com/apache/beam/issues/23179 [Bug]: Parquet size exploded
> for no apparent reason
> https://github.com/apache/beam/issues/22913 [Bug]:
> beam_PostCommit_Java_ValidatesRunner_Flink is flakey
> https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka
> SDF and fix known and discovered issues
> https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at
> getConnection() in WriteFn
> https://github.com/apache/beam/issues/21794 Dataflow runner creates a new
> timer whenever the output timestamp is change
> https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get
> output to Failed Inserts PCollection
> https://github.com/apache/beam/issues/21704
> beam_PostCommit_Java_DataflowV2 failures parent bug
> https://github.com/apache/beam/issues/21701
> beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors
> https://github.com/apache/beam/issues/21700
> --dataflowServiceOptions=use_runner_v2 is broken
> https://github.com/apache/beam/issues/21696 Flink Tests failure :
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.beam.runners.core.construction.SerializablePipelineOptions
> https://github.com/apache/beam/issues/21695 DataflowPipelineResult does
> not raise exception for unsuccessful states.
> https://github.com/apache/beam/issues/21694 BigQuery Storage API insert
> with writeResult retry and write to error table
> https://github.com/apache/beam/issues/21480 flake:
> FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
> https://github.com/apache/beam/issues/21472 Dataflow streaming tests
> failing new AfterSynchronizedProcessingTime test
> https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache
> entry
> https://github.com/apache/beam/issues/21470 Test flake:
> test_split_half_sdf
> https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink
> flaky: Connection refused
> https://github.com/apache/beam/issues/21468
> beam_PostCommit_Python_Examples_Dataflow failing
> https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java
> load tests failing
> https://github.com/apache/beam/issues/21465 Kafka commit offset drop data
> on failure for runners that have non-checkpointing shuffle
> https://github.com/apache/beam/issues/21463 NPE in Flink Portable
> ValidatesRunner streaming suite
> https://github.com/apache/beam/issues/21462 Flake in
> org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in
> use
> https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky
> in beam_PostCommit_Java_DataflowV2
> https://github.com/apache/beam/issues/21270
> org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
> flaky on Dataflow Runner V2
> https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a
> duplicate BQ load job if a 503 error code is returned from googleapi
> https://github.com/apache/beam/issues/21266
> org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
> is flaky in Java ValidatesRunner Flink suite.
> https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do
> not follow spec
> https://github.com/apache/beam/issues/21261
> org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
> is flaky
> https://github.com/apache/beam/issues/21260 Python DirectRunner does not
> emit data at GC time
> https://github.com/apache/beam/issues/21257 Either Create or DirectRunner
> fails to produce all elements to the following transform
> https://github.com/apache/beam/issues/21123 Multiple jobs running on
> Flink session cluster reuse the persistent Python environment.
> https://github.com/apache/beam/issues/21121
> apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
> flakey
> https://github.com/apache/beam/issues/21118
> 

Beam Community Meetup

2022-09-14 Thread hello

Join the next Beam Community Meetup!

Beam Community Meetup
Thursday Sep 22, 2022 ⋅ 11:30am – 12:30pm
Central Time - Mexico City

Location
https://www.crowdcast.io/e/beam-community-meetup
https://www.google.com/url?q=https%3A%2F%2Fwww.crowdcast.io%2Fe%2Fbeam-community-meetup=D=166360812000=AOvVaw1HT2xbtxulJRrdyl77Bze5



A Practitioner's View of Beam by Byron EllisJoin us via Crowdcast! Register  
here!In this talk, Byron will discuss what motivated him to join the Beam  
team, where the model benefits data practitioners (data engineering,  
machine learning, analysts, etc).He'll share some thoughts on how we can  
use these concepts to build more scalable and manageable data  
pipelines.Learn more about this meetup by consulting  
ouragenda.Remember we'll have a QA session at the end of the  
meetup!




Re: [Infrastructure] Periodically run Java microbenchmarks on Jenkins

2022-09-14 Thread Alexey Romanenko
Ahh, great! I didn’t know that 'beam-perf’ label is used for that. 
Thanks!

> On 14 Sep 2022, at 17:47, Andrew Pilloud  wrote:
> 
> We do have a dedicated machine for benchmarks. This is a single
> machine limited to running one test at a time. Set the
> jenkinsExecutorLabel for the job to 'beam-perf' to use it. For
> example:
> https://github.com/apache/beam/blob/66bbee84ed477d86008905646e68b100591b6f78/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Direct.groovy#L36
> 
> Andrew
> 
> On Wed, Sep 14, 2022 at 8:28 AM Alexey Romanenko
>  wrote:
>> 
>> I think it depends on the goal why to run that benchmarks. In ideal case, we 
>> need to run them on the same dedicated machine(s) and with the same 
>> configuration all the time but I’m not sure that it can be achieved in 
>> current infrastructure reality.
>> 
>> On the other hand, IIRC, the initial goal of benchmarks, like Nexmark, was 
>> to detect fast any major regressions, especially between releases, that are 
>> not so sensitive to ideal conditions. And here we a field for improvements.
>> 
>> —
>> Alexey
>> 
>> On 13 Sep 2022, at 22:57, Kenneth Knowles  wrote:
>> 
>> Good idea. I'm curious about our current benchmarks. Some of them run on 
>> clusters, but I think some of them are running locally and just being noisy. 
>> Perhaps this could improve that. (or if they are running on local 
>> Spark/Flink then maybe the results are not really meaningful anyhow)
>> 
>> On Tue, Sep 13, 2022 at 2:54 AM Moritz Mack  wrote:
>>> 
>>> Hi team,
>>> 
>>> 
>>> 
>>> I’m looking for some help to setup infrastructure to periodically run Java 
>>> microbenchmarks (JMH).
>>> 
>>> Results of these runs will be added to our community metrics (InfluxDB) to 
>>> help us track performance, see [1].
>>> 
>>> 
>>> 
>>> To prevent noisy runs this would require a dedicated Jenkins machine that 
>>> runs at most one job (benchmark) at a time. Benchmark runs take quite some 
>>> time, but on the other hand they don’t have to run very frequently (once a 
>>> week should be fine initially).
>>> 
>>> 
>>> 
>>> Thanks so much,
>>> 
>>> Moritz
>>> 
>>> 
>>> 
>>> [1] https://github.com/apache/beam/pull/23041
>>> 
>>> As a recipient of an email from Talend, your contact personal data will be 
>>> on our systems. Please see our privacy notice.
>>> 
>>> 
>>> 
>> 



Re: (Golang)HOW TO WRITE BIGQUERY TABLE USING SUSCRIBE TO PUBSUB MESSAGE.THROUGH APACHE BEAM IN GOLANG.

2022-09-14 Thread Ritesh Ghorse via dev
Hi,

The bigqueryio.Write() infers a schema type from the PCollection you are
writing. In the code, you are directly writing the output of pubsubio.Read
which is of []byte. You need to do insert a ParDo that would decode and
convert it to schema (Schema in go SDK is just a struct with exported
struct fields. You can add additional meta tags with struct tags) before
writing it to bigquery. You can refer to this for an example:
https://github.com/apache/beam/blob/master/sdks/go/examples/cookbook/max/max.go



On Wed, Sep 14, 2022 at 11:45 AM Abyakta Bal  wrote:

> *correcting this:-*
> *i am using this code: to read from pubsub message*
> col := pubsubio.Read(s, project, *input, {
> Subscription: sub.ID()})
> *i am using this code:- write in bigquery*
> bigqueryio.Write(s, project, *output,*col)*
>
> On Wed, Sep 14, 2022 at 6:39 PM Abyakta Bal 
> wrote:
>
>> Dear Team,
>> i am trying to write bigquery table using pubsub message
>> pull/suscribe .So i am unable to convert *beam.collection []uint8/bytes*
>> to *beam.collection* *schema type in go apachebeamsdk.*i am facing this
>> issue :-*schema type must be struct: []uint8*
>>
>> *i am using this code: to read from pubsub message*
>> col := pubsubio.Read(s, project, *input, {
>> Subscription: sub.ID()})
>> *i am using this code:- write in bigquery*
>> bigqueryio.Write(s, project, *output, out)
>>
>> I request to all please help me to solve this problem.if any
>> clarification required please let me know and arrange a call.
>>
>>
>> Thankyou,
>> Abyakta bal
>>
>>
>


Re: [Infrastructure] Periodically run Java microbenchmarks on Jenkins

2022-09-14 Thread Andrew Pilloud
We do have a dedicated machine for benchmarks. This is a single
machine limited to running one test at a time. Set the
jenkinsExecutorLabel for the job to 'beam-perf' to use it. For
example:
https://github.com/apache/beam/blob/66bbee84ed477d86008905646e68b100591b6f78/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Direct.groovy#L36

Andrew

On Wed, Sep 14, 2022 at 8:28 AM Alexey Romanenko
 wrote:
>
> I think it depends on the goal why to run that benchmarks. In ideal case, we 
> need to run them on the same dedicated machine(s) and with the same 
> configuration all the time but I’m not sure that it can be achieved in 
> current infrastructure reality.
>
> On the other hand, IIRC, the initial goal of benchmarks, like Nexmark, was to 
> detect fast any major regressions, especially between releases, that are not 
> so sensitive to ideal conditions. And here we a field for improvements.
>
> —
> Alexey
>
> On 13 Sep 2022, at 22:57, Kenneth Knowles  wrote:
>
> Good idea. I'm curious about our current benchmarks. Some of them run on 
> clusters, but I think some of them are running locally and just being noisy. 
> Perhaps this could improve that. (or if they are running on local Spark/Flink 
> then maybe the results are not really meaningful anyhow)
>
> On Tue, Sep 13, 2022 at 2:54 AM Moritz Mack  wrote:
>>
>> Hi team,
>>
>>
>>
>> I’m looking for some help to setup infrastructure to periodically run Java 
>> microbenchmarks (JMH).
>>
>> Results of these runs will be added to our community metrics (InfluxDB) to 
>> help us track performance, see [1].
>>
>>
>>
>> To prevent noisy runs this would require a dedicated Jenkins machine that 
>> runs at most one job (benchmark) at a time. Benchmark runs take quite some 
>> time, but on the other hand they don’t have to run very frequently (once a 
>> week should be fine initially).
>>
>>
>>
>> Thanks so much,
>>
>> Moritz
>>
>>
>>
>> [1] https://github.com/apache/beam/pull/23041
>>
>> As a recipient of an email from Talend, your contact personal data will be 
>> on our systems. Please see our privacy notice.
>>
>>
>>
>


Re: (Golang)HOW TO WRITE BIGQUERY TABLE USING SUSCRIBE TO PUBSUB MESSAGE.THROUGH APACHE BEAM IN GOLANG.

2022-09-14 Thread Abyakta Bal
*correcting this:-*
*i am using this code: to read from pubsub message*
col := pubsubio.Read(s, project, *input, {Subscription:
sub.ID()})
*i am using this code:- write in bigquery*
bigqueryio.Write(s, project, *output,*col)*

On Wed, Sep 14, 2022 at 6:39 PM Abyakta Bal  wrote:

> Dear Team,
> i am trying to write bigquery table using pubsub message pull/suscribe .So
> i am unable to convert *beam.collection []uint8/bytes* to
> *beam.collection* *schema type in go apachebeamsdk.*i am facing this
> issue :-*schema type must be struct: []uint8*
>
> *i am using this code: to read from pubsub message*
> col := pubsubio.Read(s, project, *input, {
> Subscription: sub.ID()})
> *i am using this code:- write in bigquery*
> bigqueryio.Write(s, project, *output, out)
>
> I request to all please help me to solve this problem.if any clarification
> required please let me know and arrange a call.
>
>
> Thankyou,
> Abyakta bal
>
>


(Golang)HOW TO WRITE BIGQUERY TABLE USING SUSCRIBE TO PUBSUB MESSAGE.THROUGH APACHE BEAM IN GOLANG.

2022-09-14 Thread Abyakta Bal
Dear Team,
i am trying to write bigquery table using pubsub message pull/suscribe .So
i am unable to convert *beam.collection []uint8/bytes* to
*beam.collection* *schema
type in go apachebeamsdk.*i am facing this issue :-*schema type must be
struct: []uint8*

*i am using this code: to read from pubsub message*
col := pubsubio.Read(s, project, *input, {Subscription:
sub.ID()})
*i am using this code:- write in bigquery*
bigqueryio.Write(s, project, *output, out)

I request to all please help me to solve this problem.if any clarification
required please let me know and arrange a call.


Thankyou,
Abyakta bal


Re: [Infrastructure] Periodically run Java microbenchmarks on Jenkins

2022-09-14 Thread Alexey Romanenko
I think it depends on the goal why to run that benchmarks. In ideal case, we 
need to run them on the same dedicated machine(s) and with the same 
configuration all the time but I’m not sure that it can be achieved in current 
infrastructure reality. 

On the other hand, IIRC, the initial goal of benchmarks, like Nexmark, was to 
detect fast any major regressions, especially between releases, that are not so 
sensitive to ideal conditions. And here we a field for improvements.

—
Alexey

> On 13 Sep 2022, at 22:57, Kenneth Knowles  wrote:
> 
> Good idea. I'm curious about our current benchmarks. Some of them run on 
> clusters, but I think some of them are running locally and just being noisy. 
> Perhaps this could improve that. (or if they are running on local Spark/Flink 
> then maybe the results are not really meaningful anyhow)
> 
> On Tue, Sep 13, 2022 at 2:54 AM Moritz Mack  > wrote:
> Hi team,
> 
>  
> 
> I’m looking for some help to setup infrastructure to periodically run Java 
> microbenchmarks (JMH).
> 
> Results of these runs will be added to our community metrics (InfluxDB) to 
> help us track performance, see [1]. 
> 
>  
> 
> To prevent noisy runs this would require a dedicated Jenkins machine that 
> runs at most one job (benchmark) at a time. Benchmark runs take quite some 
> time, but on the other hand they don’t have to run very frequently (once a 
> week should be fine initially).
> 
>  
> 
> Thanks so much,
> 
> Moritz
> 
>  
> 
> [1] https://github.com/apache/beam/pull/23041 
> 
> As a recipient of an email from Talend, your contact personal data will be on 
> our systems. Please see our privacy notice. 
> 



Beam High Priority Issue Report (54)

2022-09-14 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/23227 [Bug]: Python SDK installation 
cannot generate proto with protobuf 3.20.2
https://github.com/apache/beam/issues/23179 [Bug]: Parquet size exploded for no 
apparent reason
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakey
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21257 Either Create or DirectRunner fails 
to produce all elements to the following transform
https://github.com/apache/beam/issues/21123 Multiple jobs running on Flink 
session cluster reuse the persistent Python environment.
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21118 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky
https://github.com/apache/beam/issues/21114 Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN
https://github.com/apache/beam/issues/21113 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky
https://github.com/apache/beam/issues/2 Java creates an incorrect pipeline 
proto when core-construction-java jar is not in the CLASSPATH
https://github.com/apache/beam/issues/21109 SDF BoundedSource seems to execute 
significantly slower than 'normal' BoundedSource
https://github.com/apache/beam/issues/20981 Python precommit flaky: Failed to 
read inputs in the data plane
https://github.com/apache/beam/issues/20977 SamzaStoreStateInternalsTest is 
flaky