Re: [JmsIO] => Pull Request to fix message acknowledgement issue

2022-08-29 Thread Jean-Baptiste Onofré
Hi Vincent,

thanks, I will take a look (as original JmsIO author ;)).

Regards
JB

On Mon, Aug 29, 2022 at 6:43 PM BALLADA Vincent
 wrote:
>
> Hi all,
>
>
>
> Here is a PR related to the following issue (Runner acknowledges messages on 
> closed session):
>
> https://github.com/apache/beam/issues/20814
>
>
>
> And here is a documentation explaining the fix:
>
> https://docs.google.com/document/d/19HiNPoJeIlzCFyWGdlw7WEFutceL2AmN_5Z0Vmi1s-I/edit?usp=sharing
>
>
>
> And finally the PR:
>
> https://github.com/apache/beam/pull/22932
>
>
>
> Regards
>
>
>
> Vincent BALLADA
>
>
>
>
>
>
> Confidential C
>
> -- Disclaimer 
> Ce message ainsi que les eventuelles pieces jointes constituent une 
> correspondance privee et confidentielle a l'attention exclusive du 
> destinataire designe ci-dessus. Si vous n'etes pas le destinataire du present 
> message ou une personne susceptible de pouvoir le lui delivrer, il vous est 
> signifie que toute divulgation, distribution ou copie de cette transmission 
> est strictement interdite. Si vous avez recu ce message par erreur, nous vous 
> remercions d'en informer l'expediteur par telephone ou de lui retourner le 
> present message, puis d'effacer immediatement ce message de votre systeme.
>
> *** This e-mail and any attachments is a confidential correspondence intended 
> only for use of the individual or entity named above. If you are not the 
> intended recipient or the agent responsible for delivering the message to the 
> intended recipient, you are hereby notified that any disclosure, distribution 
> or copying of this communication is strictly prohibited. If you have received 
> this communication in error, please notify the sender by phone or by replying 
> this message, and then delete this message from your system.


RE: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.

2022-08-29 Thread Neil Kolban via dev
Howdy,
I have a client who would be interested to use this.  Is there a link to a
GitHub repo or other place I can read more?

Neil  (kol...@google.com)

On 2022/08/05 07:23:31 张涛 wrote:
>
> Hi, we developed a new IO connector named DataLakeIO, to connect Beam and
data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can use
DataLakeIO to read data from data lake, and write data to data lake. We did
not find data lake IO on https://beam.apache.org/documentation/io/built-in/,
we want to contribute this new IO connector to Beam, what should we do
next? Thank you very much!


Re: Java compatibility checking library

2022-08-29 Thread Kenneth Knowles
I think this is still worth considering. We would need to see how well it
does at avoiding false alarms. It was brought up by Andrew a while back at
https://lists.apache.org/thread/kq2w6rw0oorcj037ygjvfsqw1l9l55jj and I know
we had another discussion of it even further back. I thought also that the
Flink project had discussed it and had some learnings, but I have failed to
find the thread.

We'll need it to be configurable to exclude experimental and internal
stuff. And we will need to be a lot more careful about making sure things
go through an incubation phase than we usually are.

Kenn

On Mon, Aug 29, 2022 at 1:23 PM Pablo Estrada via dev 
wrote:

> Hi folks!
> I learned about this library:
> https://lvc.github.io/japi-compliance-checker/
>
> It looks like it's able to check API compatibility between versions of
> libraries. We could instate checks that use this library to be more aware
> of backwards incompatibilities that we may introduce accidentally, and at
> least be forced to make a deliberate decision about them.
>
> LMK what you think about adding checks with this utility : )
>
> -P.
>


Re: Checkpointing on Google Cloud Dataflow Runner

2022-08-29 Thread Kenneth Knowles
Hi Will, David,

I think you'll find the best source of answer for this sort of question on
the user@beam list. I've put that in the To: line with a BCC: to the
dev@beam list so everyone knows they can find the thread there. If I have
misunderstood, and your question has to do with building Beam itself, feel
free to move it back.

Kenn

On Mon, Aug 29, 2022 at 2:24 PM Will Baker  wrote:

> Hello!
>
> I am wondering about using checkpoints with Beam running on Google
> Cloud Dataflow.
>
> The docs indicate that checkpoints are not supported by Google Cloud
> Dataflow:
> https://beam.apache.org/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/
>
> Is there a recommended approach to handling checkpointing on Google
> Cloud Dataflow when using streaming sources like Kinesis and Kafka, so
> that a pipeline could be resumed from where it left off if it needs to
> be stopped or crashes for some reason?
>
> Thanks!
> Will Baker
>


Checkpointing on Google Cloud Dataflow Runner

2022-08-29 Thread Will Baker
Hello!

I am wondering about using checkpoints with Beam running on Google
Cloud Dataflow.

The docs indicate that checkpoints are not supported by Google Cloud
Dataflow:  
https://beam.apache.org/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/

Is there a recommended approach to handling checkpointing on Google
Cloud Dataflow when using streaming sources like Kinesis and Kafka, so
that a pipeline could be resumed from where it left off if it needs to
be stopped or crashes for some reason?

Thanks!
Will Baker


Java compatibility checking library

2022-08-29 Thread Pablo Estrada via dev
Hi folks!
I learned about this library: https://lvc.github.io/japi-compliance-checker/

It looks like it's able to check API compatibility between versions of
libraries. We could instate checks that use this library to be more aware
of backwards incompatibilities that we may introduce accidentally, and at
least be forced to make a deliberate decision about them.

LMK what you think about adding checks with this utility : )

-P.


[JmsIO] => Pull Request to fix message acknowledgement issue

2022-08-29 Thread BALLADA Vincent
Hi all,

Here is a PR related to the following issue (Runner acknowledges messages on 
closed session):
https://github.com/apache/beam/issues/20814

And here is a documentation explaining the fix:
https://docs.google.com/document/d/19HiNPoJeIlzCFyWGdlw7WEFutceL2AmN_5Z0Vmi1s-I/edit?usp=sharing

And finally the PR:
https://github.com/apache/beam/pull/22932

Regards

Vincent BALLADA




Confidential C
-- Disclaimer  
Ce message ainsi que les eventuelles pieces jointes constituent une 
correspondance privee et confidentielle a l'attention exclusive du destinataire 
designe ci-dessus. Si vous n'etes pas le destinataire du present message ou une 
personne susceptible de pouvoir le lui delivrer, il vous est signifie que toute 
divulgation, distribution ou copie de cette transmission est strictement 
interdite. Si vous avez recu ce message par erreur, nous vous remercions d'en 
informer l'expediteur par telephone ou de lui retourner le present message, 
puis d'effacer immediatement ce message de votre systeme.

*** This e-mail and any attachments is a confidential correspondence intended 
only for use of the individual or entity named above. If you are not the 
intended recipient or the agent responsible for delivering the message to the 
intended recipient, you are hereby notified that any disclosure, distribution 
or copying of this communication is strictly prohibited. If you have received 
this communication in error, please notify the sender by phone or by replying 
this message, and then delete this message from your system.


Re: [Question] Progress on BEAM-9923 and how can I help?

2022-08-29 Thread Danny McCormick via dev
Hey Ronoaldo,

I believe some investigation has been done on adding a Go expansion service
(written up here
),
but no concrete work has been done yet. I don't know of any general
guidelines around building a new expansion service, though others here may.

Given that a few different approaches have been discussed, I would
recommend trying to get consensus on an overall approach before putting
together a PR via a design doc (or in this thread).

Thanks,
Danny

On Sun, Aug 28, 2022 at 9:31 AM Ronoaldo Pereira  wrote:

> Hi!
>
> I'm wondering if any progress was made on
> https://issues.apache.org/jira/browse/BEAM-9923 (
> https://github.com/apache/beam/issues/21767), and if not, is there any
> general guidelines for developing the expansion service that are relevant
> for me to draft a PR?
>
> Best regards,
>
> 
>
> Ronoaldo Pereira
>
> Google Cloud Authorized Trainer
>
> ronoa...@arki1.com / +55 (11) 98703-0927 <+55%2011%2098703-0927>
>
> Arki1  - Google Cloud Authorized Training Partner
>
> 
> 
> 
> 
>


Beam High Priority Issue Report (70)

2022-08-29 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is failing
https://github.com/apache/beam/issues/22779 [Bug]: SpannerIO.readChangeStream() 
stops forwarding change records and starts continuously throwing (large number) 
of Operation ongoing errors 
https://github.com/apache/beam/issues/22749 [Bug]: Bytebuddy version update 
causes Invisible parameter type error
https://github.com/apache/beam/issues/22743 [Bug]: Test flake: 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest.testInsertWithinRowCountLimits
https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow 
SideInput LoadTests failing
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22283 [Bug]: Python Lots of fn runner 
test items cost exactly 5 seconds to run
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in 
beam_PostCommit_Java_DataflowV1 and V2
https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam 
PostCommit Java V1
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21268 Race between member variable being 
accessed due to leaking uninitialized state via OutboundObserverFactory
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21265 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked 
Dataflow Pipeline 
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky

Re: SingleStore IO

2022-08-29 Thread Adalbert Makarovych
Thanks for the answer!

It would be great to have a call with you.
Here is a meeting invitation.

SingleStore Beam connector discussion
Wednesday, August 31 · 7:00 – 8:00pm
Google Meet joining info
Video call link: https://meet.google.com/uiq-btvt-tpw
Or dial: ‪(GB) +44 20 3956 6918‬ PIN: ‪226 926 316‬#
More phone numbers: https://tel.meet/uiq-btvt-tpw?pin=2285528438746

I don't know in what timezone you are, so please email if this time is not
suitable for you.

On Thu, Aug 25, 2022 at 7:33 AM John Casey via dev 
wrote:

> Hi Adalbert,
>
> The nature of scheduling work with splittable DoFns is such that trying to
> start all splits at the same time isn't really supported. In addition, the
> general assumption of splitting work in Beam is that a split can be retried
> in isolation from other splits, which doesn't look supported by SingleStore
> parallel read.
>
> That said, this looks really promising, so I'd be happy to get on a call
> to help better understand your design, and see if we can find a solution.
>
> John
>
> On Thu, Aug 25, 2022 at 10:16 AM Adalbert Makarovych <
> amakarovych...@singlestore.com> wrote:
>
>> Hello,
>>
>> I'm working on the SingleStore IO connector and would like to discuss it
>> with Beam developers.
>> It would be great if the connector can use SingleStore parallel read
>> .
>> In the ideal case, the connector should use Single-read mode as it is
>> faster than Multiple-read and consumes much less memory.
>>
>> One of the problems is that in Single-read mode, each reader must
>> initiate its read query before any readers will receive data. Is it
>> possible to somehow configure Beam to start all DoFns at the same time? Or
>> to get the numbers of started DoFns at the runtime?
>>
>> The other problem is that Single-read allows reading data from partition
>> only once, so if one reading thread failed - all others should be restarted
>> to retry. Is it possible to achieve this behavior? Or to at least
>> gracefully fail without additional retries?
>>
>> Here are the first drafts of the design documentation
>> 
>> .
>> I would appreciate any help with this stuff :)
>>
>> --
>> Adalbert Makarovych
>> Software Engineer at SingleStore
>>
>>
>> 
>>
>

-- 
Adalbert Makarovych
Software Engineer at SingleStore