Re: I/O connectors for streaming GRPC source

2020-09-30 Thread Luke Cwik
There is no generic gRPC connector and it is unlikely that there ever will
be one.

A lot of the time integration with external systems is for ingesting large
amounts of data which works best with certain features which gRPC doesn't
natively support but an application protocol built on top of gRPC usually
does. Things like checkpointing and resuming from a position in the stream,
being able to split streams, acking messages so they aren't published in
the stream,  To learn more, you should take a look at this splittable
DoFn blog[1].

There are a few sources that have been written that use gRPC but Apache
Beam integrates using a higher level application specific protocol. Take a
look at SpannerIO[2] and PubsubLite[3] since they wrap gRPC with their
client libraries.

You can always start by writing a normal DoFn that connects to this service
and eventually migrating to a splittable DoFn once you have scaling
concerns.

1: https://beam.apache.org/blog/splittable-do-fn/
2:
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
3:
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java

On Wed, Sep 30, 2020 at 11:24 AM Maksim Pilipeyko <
maksim.pilipe...@colada.biz> wrote:

> Hi,
>
>
>
> What connector can I use if I should read data from streaming grpc api?
>
>
>
> Best regards,
> Maksim
>


Re: jira ticket BEAM-10990

2020-09-30 Thread Pablo Estrada
Hi Steven,
I've added you as a contributor, so feel free to assign the issue to
yourself.

BigQueryIO has a similar pattern for its write transform. You can do
BigQueryIO.write(...).withExtendedErrorInfo()[1], and it will return a
WriteResult[2], which contains a PCollection of failed inserts with their
errors[3].

You could add a similar option to ElasticSearchIO which determines when to
retry errors, and what to do with them. Ideally, the default behavior would
not change (to ensure backwards compatibility), but when specifying the new
configuration, that would help you address your use case.
By the way, this pattern is very common, so I bet it will be a useful
addition for other users.

If that works for you, we'd be happy to welcome your contribution.
Best
-P.

[1]
https://beam.apache.org/releases/javadoc/2.24.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withExtendedErrorInfo--
[2]
https://beam.apache.org/releases/javadoc/2.24.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html

[3]
https://beam.apache.org/releases/javadoc/2.24.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedInsertsWithErr--

On Wed, Sep 30, 2020 at 11:23 AM Steven Gaunt 
wrote:

> Hi guys
>
> I've created a jira ticket BEAM-10990
>  as we are keen to use
> ElasticSearchIO.
>
> would there be any objections if I was create a branch and propose a fix
> for BEAM-10990  ?
>
> thanks
>
> Steve
>
> This message contains proprietary information from Equifax which may be
> confidential. If you are not an intended recipient, please refrain from any
> disclosure, copying, distribution or use of this information and note that
> such actions are prohibited. If you have received this transmission in
> error, please notify by e-mail postmas...@equifax.com. Equifax® is a
> registered trademark of Equifax Inc. All rights reserved.
>
>


I/O connectors for streaming GRPC source

2020-09-30 Thread Maksim Pilipeyko
Hi,

What connector can I use if I should read data from streaming grpc api?

Best regards,
Maksim


jira ticket BEAM-10990

2020-09-30 Thread Steven Gaunt
Hi guys

I've created a jira ticket BEAM-10990
 as we are keen to use
ElasticSearchIO.

would there be any objections if I was create a branch and propose a fix
for BEAM-10990  ?

thanks

Steve

-- 
This message contains proprietary information from Equifax which may be 
confidential. If you are not an intended recipient, please refrain from any 
disclosure, copying, distribution or use of this information and note that 
such actions are prohibited. If you have received this transmission in 
error, please notify by e-mail postmas...@equifax.com 
. Equifax® is a registered trademark of 
Equifax Inc. All rights reserved.











Re: Behavior change for Gradle build target

2020-09-30 Thread Brian Hulette
I opened up BEAM-10986 [1] to track this, let's move discussion there.

[1] https://issues.apache.org/jira/browse/BEAM-10986

On Tue, Sep 29, 2020 at 2:40 PM Brian Hulette  wrote:

> A bisect found [1] to be the culprit, it upgrades the shadow plugin. I
> suspect it's related to this change in shadow 4.0.4 [2], but I'm not sure:
>
> > When using shadow, application, and maven plugins together, remove
> shadowDistZip and shadowDistTar from
> > configurations.archives so they are not published or installed by
> default with the uploadArchives or install
> > tasks. #347
>
> I think we may want to rollback the shadow plugin upgrade and cherry-pick
> onto the 2.25.0 branch until we can debug.
>
> [1] https://github.com/apache/beam/pull/12821
> [2] https://github.com/johnrengelman/shadow/releases/tag/4.0.4
>
> On Tue, Sep 29, 2020 at 1:45 PM Brian Hulette  wrote:
>
>> I think that output is still incorrect for
>> :sdks:java:io:expansion-service, there should be both shaded and unshaded
>> in ./sdks/java/io/expansion-service/build/libs
>>
>> On Tue, Sep 29, 2020 at 1:08 PM Tyson Hamilton 
>> wrote:
>>
>>> Or, possibly that PR already fixed the issue. Could you sync, retry
>>> again? Here is what I found after running the gradle task you mentioned:
>>>
>>> ttysonjh@tysonjh:~/Development/beam$ find . -name
>>> "*expansion-service*SNAPSHOT*"
>>>
>>> ./sdks/java/expansion-service/build/libs/beam-sdks-java-expansion-service-2.26.0-SNAPSHOT.jar
>>>
>>> ./sdks/java/io/expansion-service/build/libs/beam-sdks-java-io-expansion-service-2.26.0-SNAPSHOT-unshaded.jar
>>>
>>> ./sdks/java/io/expansion-service/build/libs/beam-sdks-java-io-expansion-service-2.26.0-SNAPSHOT-tests-unshaded.jar
>>>
>>> ./sdks/java/io/expansion-service/build/distributions/expansion-service-2.26.0-SNAPSHOT.tar
>>>
>>> ./sdks/java/io/expansion-service/build/distributions/expansion-service-2.26.0-SNAPSHOT.zip
>>>
>>>
>>> On Tue, Sep 29, 2020 at 12:59 PM Tyson Hamilton 
>>> wrote:
>>>
 Hm. It sounds like it is possibly related to the gradle upgrade to 6? I
 had a similar issue that I fixed for the Java nightly snapshot build in
 PR#12947 (https://github.com/apache/beam/pull/12947). It may require
 similar changes to the gradle.build file to generate the jar.

 On Tue, Sep 29, 2020 at 8:20 AM Chamikara Jayalath <
 chamik...@google.com> wrote:

> +Brian Hulette  for any follow ups here since
> I'll be OOO for a few days.
>
> Thanks,
> Cham
>
> On Mon, Sep 28, 2020 at 10:34 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Hi,
>>
>> I just noticed that the result jars produced by command "./gradlew
>> :sdks:java:io:expansion-service:build" went from both shaded [1] and
>> unshaded [2] jars for Beam 2.24.0 branch to just the unshaded jar [3] for
>> Beam 2.25.0 branch.
>>
>> Does anyone know what resulted in this behaviour change ?
>>
>> Also, does this mean that the jars released for Beam 2.25.0 will be
>> different ? (if so I believe that will break cross-language transforms 
>> that
>> will try to download the shaded jar for expansion).
>>
>> [1] beam-sdks-java-io-expansion-service-2.24.0-SNAPSHOT.jar
>> [2] beam-sdks-java-io-expansion-service-2.24.0-SNAPSHOT-unshaded.jar
>> [3] beam-sdks-java-io-expansion-service-2.25.0-SNAPSHOT-unshaded.jar
>>
>