Re: Registering Protobuf schema

2020-08-19 Thread Luke Cwik
With providers there is also an ordering issue since multiple providers
could work for a given type so we want to apply them using some stable
ordering.

On Wed, Aug 19, 2020 at 10:08 AM Brian Hulette  wrote:

> Ah yes, the SchemaRegistry and SchemaProvider follow the same model, but
> none of the SchemaProviders are registered by default. Users can register
> the proto schema provider with
> registerSchemaProvider(Class) [1]:
>
>   p.getSchemaRegistry().registerSchemaProvider(ProtoMessageSchema.class);
>
> Then SchemaCoder should be used for all proto classes.
> We could use ServiceLoader to register all schema providers, then users
> wouldn't need to do this. I assume the reason we don't already is because
> schemas are still experimental and we want it to be opt-in.
>
> [1]
> https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/schemas/SchemaRegistry.html#registerSchemaProvider-org.apache.beam.sdk.schemas.SchemaProvider-
> <https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/schemas/SchemaRegistry.html#registerSchemaProvider-org.apache.beam.sdk.schemas.SchemaProvider->
>
> On Wed, Aug 19, 2020 at 8:44 AM Luke Cwik  wrote:
>
>> Brian, Coders have a provider model where the provider can be queried to
>> resolve for a given type and the providers are resolved in a specific
>> order. This gave the flexibility to handle situations like the one you
>> described.
>>
>> On Wed, Aug 19, 2020 at 12:30 AM 
>> wrote:
>>
>>> Hi Brian,
>>>
>>>
>>>
>>> Many thanks for your mail.
>>>
>>>
>>>
>>> Yes I figured that one out in the end from the docs, but many thanks for
>>> confirming.
>>>
>>>
>>>
>>> I did subsequently discover some other issues with protoBuf-derived
>>> schemas (essentially they don’t seem to be properly supported by
>>> BigQueryIO.Write or allow for optional fields) but I posted a separate
>>> message on the dev channel covering this.
>>>
>>>
>>>
>>> Kind regards,
>>>
>>>
>>>
>>> Rob
>>>
>>>
>>>
>>> *From:* Brian Hulette [mailto:bhule...@google.com]
>>> *Sent:* 18 August 2020 20:50
>>> *To:* user
>>> *Subject:* Re: Registering Protobuf schema
>>>
>>>
>>>
>>>
>>> *
>>> "This is an external email. Do you know who has sent it? Can you be sure
>>> that any links and attachments contained within it are safe? If in any
>>> doubt, use the Phishing Reporter Button in your Outlook client or forward
>>> the email as an attachment to ~ I've Been Phished"
>>> *
>>>
>>> Hi Robert,
>>> Sorry for the late reply on this. I think you should be able to do this
>>> by registering it in your pipeline's SchemaRegistry manually, like so:
>>>
>>>
>>>
>>>   Pipeline p;
>>>
>>>   p.getSchemaRegistry().registerSchemaProvider(Fx.class,
>>> ProtoMessageSchema.class);
>>>
>>> Of course this isn't quite as nice as just adding the DefualtSchema
>>> annotation to a class you control. Maybe we should consider some global
>>> config that would always use schemas for proto-generated classes.
>>>
>>>
>>> Brian
>>>
>>>
>>>
>>> On Sun, Jul 12, 2020 at 9:56 AM Kaymak, Tobias 
>>> wrote:
>>>
>>> This sounds like it is related to the problem I'm trying to solve. (In
>>> my case having a Java POJO containing a protobuf backed-class and trying to
>>> generate a Beam Schema from it.)
>>>
>>> I would be very interested to a solution to this as well :)
>>>
>>>
>>>
>>> On Tue, Jul 7, 2020 at 2:22 PM 
>>> wrote:
>>>
>>> Hi All,
>>>
>>>
>>>
>>> I have a BEAM pipeline where I am reading data from some parquet files
>>> and converting them into a different format based on protobuf generated
>>> classes.
>>>
>>>
>>>
>>> I wish to associate a schema (derived from the protobuf classes) for my
>>> PCollections.  What is the appropriate way to do this with
>>> protobuf-generated classes?
>>>
>>>
>>>
>>> Code excerpt:
>>>
>>>
>>>
>>> PCollection result = input.apply("FXFilePattern", FileIO.*match*
>>> ().filepattern(fxDataFilePa

Re: Registering Protobuf schema

2020-08-19 Thread Brian Hulette
Ah yes, the SchemaRegistry and SchemaProvider follow the same model, but
none of the SchemaProviders are registered by default. Users can register
the proto schema provider with
registerSchemaProvider(Class) [1]:

  p.getSchemaRegistry().registerSchemaProvider(ProtoMessageSchema.class);

Then SchemaCoder should be used for all proto classes.
We could use ServiceLoader to register all schema providers, then users
wouldn't need to do this. I assume the reason we don't already is because
schemas are still experimental and we want it to be opt-in.

[1]
https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/schemas/SchemaRegistry.html#registerSchemaProvider-org.apache.beam.sdk.schemas.SchemaProvider-
<https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/schemas/SchemaRegistry.html#registerSchemaProvider-org.apache.beam.sdk.schemas.SchemaProvider->

On Wed, Aug 19, 2020 at 8:44 AM Luke Cwik  wrote:

> Brian, Coders have a provider model where the provider can be queried to
> resolve for a given type and the providers are resolved in a specific
> order. This gave the flexibility to handle situations like the one you
> described.
>
> On Wed, Aug 19, 2020 at 12:30 AM 
> wrote:
>
>> Hi Brian,
>>
>>
>>
>> Many thanks for your mail.
>>
>>
>>
>> Yes I figured that one out in the end from the docs, but many thanks for
>> confirming.
>>
>>
>>
>> I did subsequently discover some other issues with protoBuf-derived
>> schemas (essentially they don’t seem to be properly supported by
>> BigQueryIO.Write or allow for optional fields) but I posted a separate
>> message on the dev channel covering this.
>>
>>
>>
>> Kind regards,
>>
>>
>>
>> Rob
>>
>>
>>
>> *From:* Brian Hulette [mailto:bhule...@google.com]
>> *Sent:* 18 August 2020 20:50
>> *To:* user
>> *Subject:* Re: Registering Protobuf schema
>>
>>
>>
>>
>> *
>> "This is an external email. Do you know who has sent it? Can you be sure
>> that any links and attachments contained within it are safe? If in any
>> doubt, use the Phishing Reporter Button in your Outlook client or forward
>> the email as an attachment to ~ I've Been Phished"
>> *
>>
>> Hi Robert,
>> Sorry for the late reply on this. I think you should be able to do this
>> by registering it in your pipeline's SchemaRegistry manually, like so:
>>
>>
>>
>>   Pipeline p;
>>
>>   p.getSchemaRegistry().registerSchemaProvider(Fx.class,
>> ProtoMessageSchema.class);
>>
>> Of course this isn't quite as nice as just adding the DefualtSchema
>> annotation to a class you control. Maybe we should consider some global
>> config that would always use schemas for proto-generated classes.
>>
>>
>> Brian
>>
>>
>>
>> On Sun, Jul 12, 2020 at 9:56 AM Kaymak, Tobias 
>> wrote:
>>
>> This sounds like it is related to the problem I'm trying to solve. (In my
>> case having a Java POJO containing a protobuf backed-class and trying to
>> generate a Beam Schema from it.)
>>
>> I would be very interested to a solution to this as well :)
>>
>>
>>
>> On Tue, Jul 7, 2020 at 2:22 PM  wrote:
>>
>> Hi All,
>>
>>
>>
>> I have a BEAM pipeline where I am reading data from some parquet files
>> and converting them into a different format based on protobuf generated
>> classes.
>>
>>
>>
>> I wish to associate a schema (derived from the protobuf classes) for my
>> PCollections.  What is the appropriate way to do this with
>> protobuf-generated classes?
>>
>>
>>
>> Code excerpt:
>>
>>
>>
>> PCollection result = input.apply("FXFilePattern", FileIO.*match*
>> ().filepattern(fxDataFilePattern))
>> .apply("FXReadMatches", FileIO.*readMatches*())
>> .apply("FXReadParquetFile", ParquetIO.*readFiles*(fxAvroSchema))
>> .apply("MapFXToProto", ParDo.*of*(MapFXToProto.*of*()));
>> boolean hasSchema = result.hasSchema();  // returns false
>>
>>
>>
>> With thanks in advance.
>>
>>
>>
>> Kind regards,
>>
>>
>>
>> Rob
>>
>>
>>
>> *Robert Butcher*
>>
>> *Technical Architect | Foundry/SRS | NatWest Markets*
>>
>> WeWork, 10 Devonshire Square, London, EC2M 4AE
>>
>> Mobile +44 (0) 7414 730866 <+44%207414%20730866>

Re: Registering Protobuf schema

2020-08-19 Thread Luke Cwik
Brian, Coders have a provider model where the provider can be queried to
resolve for a given type and the providers are resolved in a specific
order. This gave the flexibility to handle situations like the one you
described.

On Wed, Aug 19, 2020 at 12:30 AM  wrote:

> Hi Brian,
>
>
>
> Many thanks for your mail.
>
>
>
> Yes I figured that one out in the end from the docs, but many thanks for
> confirming.
>
>
>
> I did subsequently discover some other issues with protoBuf-derived
> schemas (essentially they don’t seem to be properly supported by
> BigQueryIO.Write or allow for optional fields) but I posted a separate
> message on the dev channel covering this.
>
>
>
> Kind regards,
>
>
>
> Rob
>
>
>
> *From:* Brian Hulette [mailto:bhule...@google.com]
> *Sent:* 18 August 2020 20:50
> *To:* user
> *Subject:* Re: Registering Protobuf schema
>
>
>
>
> *
> "This is an external email. Do you know who has sent it? Can you be sure
> that any links and attachments contained within it are safe? If in any
> doubt, use the Phishing Reporter Button in your Outlook client or forward
> the email as an attachment to ~ I've Been Phished"
> *
>
> Hi Robert,
> Sorry for the late reply on this. I think you should be able to do this by
> registering it in your pipeline's SchemaRegistry manually, like so:
>
>
>
>   Pipeline p;
>
>   p.getSchemaRegistry().registerSchemaProvider(Fx.class,
> ProtoMessageSchema.class);
>
> Of course this isn't quite as nice as just adding the DefualtSchema
> annotation to a class you control. Maybe we should consider some global
> config that would always use schemas for proto-generated classes.
>
>
> Brian
>
>
>
> On Sun, Jul 12, 2020 at 9:56 AM Kaymak, Tobias 
> wrote:
>
> This sounds like it is related to the problem I'm trying to solve. (In my
> case having a Java POJO containing a protobuf backed-class and trying to
> generate a Beam Schema from it.)
>
> I would be very interested to a solution to this as well :)
>
>
>
> On Tue, Jul 7, 2020 at 2:22 PM  wrote:
>
> Hi All,
>
>
>
> I have a BEAM pipeline where I am reading data from some parquet files and
> converting them into a different format based on protobuf generated classes.
>
>
>
> I wish to associate a schema (derived from the protobuf classes) for my
> PCollections.  What is the appropriate way to do this with
> protobuf-generated classes?
>
>
>
> Code excerpt:
>
>
>
> PCollection result = input.apply("FXFilePattern", FileIO.*match*
> ().filepattern(fxDataFilePattern))
> .apply("FXReadMatches", FileIO.*readMatches*())
> .apply("FXReadParquetFile", ParquetIO.*readFiles*(fxAvroSchema))
> .apply("MapFXToProto", ParDo.*of*(MapFXToProto.*of*()));
> boolean hasSchema = result.hasSchema();  // returns false
>
>
>
> With thanks in advance.
>
>
>
> Kind regards,
>
>
>
> Rob
>
>
>
> *Robert Butcher*
>
> *Technical Architect | Foundry/SRS | NatWest Markets*
>
> WeWork, 10 Devonshire Square, London, EC2M 4AE
>
> Mobile +44 (0) 7414 730866 <+44%207414%20730866>
>
>
>
> This email is classified as *CONFIDENTIAL* unless otherwise stated.
>
>
>
>
>
> This communication and any attachments are confidential and intended
> solely for the addressee. If you are not the intended recipient please
> advise us immediately and delete it. Unless specifically stated in the
> message or otherwise indicated, you may not duplicate, redistribute or
> forward this message and any attachments are not intended for distribution
> to, or use by any person or entity in any jurisdiction or country where
> such distribution or use would be contrary to local law or regulation.
> NatWest Markets Plc  or any affiliated entity ("NatWest Markets") accepts
> no responsibility for any changes made to this message after it was sent.
>
> Unless otherwise specifically indicated, the contents of this
> communication and its attachments are for information purposes only and
> should not be regarded as an offer or solicitation to buy or sell a product
> or service, confirmation of any transaction, a valuation, indicative price
> or an official statement. Trading desks may have a position or interest
> that is inconsistent with any views expressed in this message. In
> evaluating the information contained in this message, you should know that
> it could have been previously provided to other clients and/or internal
> NatWest Markets personnel, who could have already 

Re: Registering Protobuf schema

2020-08-18 Thread Brian Hulette
Hi Robert,
Sorry for the late reply on this. I think you should be able to do this by
registering it in your pipeline's SchemaRegistry manually, like so:

  Pipeline p;
  p.getSchemaRegistry().registerSchemaProvider(Fx.class,
ProtoMessageSchema.class);

Of course this isn't quite as nice as just adding the DefualtSchema
annotation to a class you control. Maybe we should consider some global
config that would always use schemas for proto-generated classes.

Brian

On Sun, Jul 12, 2020 at 9:56 AM Kaymak, Tobias 
wrote:

> This sounds like it is related to the problem I'm trying to solve. (In my
> case having a Java POJO containing a protobuf backed-class and trying to
> generate a Beam Schema from it.)
>
> I would be very interested to a solution to this as well :)
>
> On Tue, Jul 7, 2020 at 2:22 PM  wrote:
>
>> Hi All,
>>
>>
>>
>> I have a BEAM pipeline where I am reading data from some parquet files
>> and converting them into a different format based on protobuf generated
>> classes.
>>
>>
>>
>> I wish to associate a schema (derived from the protobuf classes) for my
>> PCollections.  What is the appropriate way to do this with
>> protobuf-generated classes?
>>
>>
>>
>> Code excerpt:
>>
>>
>>
>> PCollection result = input.apply("FXFilePattern", FileIO.*match*
>> ().filepattern(fxDataFilePattern))
>> .apply("FXReadMatches", FileIO.*readMatches*())
>> .apply("FXReadParquetFile", ParquetIO.*readFiles*(fxAvroSchema))
>> .apply("MapFXToProto", ParDo.*of*(MapFXToProto.*of*()));
>> boolean hasSchema = result.hasSchema();  // returns false
>>
>>
>>
>> With thanks in advance.
>>
>>
>>
>> Kind regards,
>>
>>
>>
>> Rob
>>
>>
>>
>> *Robert Butcher*
>>
>> *Technical Architect | Foundry/SRS | NatWest Markets*
>>
>> WeWork, 10 Devonshire Square, London, EC2M 4AE
>>
>> Mobile +44 (0) 7414 730866 <+44%207414%20730866>
>>
>>
>>
>> This email is classified as *CONFIDENTIAL* unless otherwise stated.
>>
>>
>>
>> This communication and any attachments are confidential and intended
>> solely for the addressee. If you are not the intended recipient please
>> advise us immediately and delete it. Unless specifically stated in the
>> message or otherwise indicated, you may not duplicate, redistribute or
>> forward this message and any attachments are not intended for distribution
>> to, or use by any person or entity in any jurisdiction or country where
>> such distribution or use would be contrary to local law or regulation.
>> NatWest Markets Plc  or any affiliated entity ("NatWest Markets") accepts
>> no responsibility for any changes made to this message after it was sent.
>> Unless otherwise specifically indicated, the contents of this
>> communication and its attachments are for information purposes only and
>> should not be regarded as an offer or solicitation to buy or sell a product
>> or service, confirmation of any transaction, a valuation, indicative price
>> or an official statement. Trading desks may have a position or interest
>> that is inconsistent with any views expressed in this message. In
>> evaluating the information contained in this message, you should know that
>> it could have been previously provided to other clients and/or internal
>> NatWest Markets personnel, who could have already acted on it.
>> NatWest Markets cannot provide absolute assurances that all electronic
>> communications (sent or received) are secure, error free, not corrupted,
>> incomplete or virus free and/or that they will not be lost, mis-delivered,
>> destroyed, delayed or intercepted/decrypted by others. Therefore NatWest
>> Markets disclaims all liability with regards to electronic communications
>> (and the contents therein) if they are corrupted, lost destroyed, delayed,
>> incomplete, mis-delivered, intercepted, decrypted or otherwise
>> misappropriated by others.
>> Any electronic communication that is conducted within or through NatWest
>> Markets systems will be subject to being archived, monitored and produced
>> to regulators and in litigation in accordance with NatWest Markets’ policy
>> and local laws, rules and regulations. Unless expressly prohibited by local
>> law, electronic communications may be archived in countries other than the
>> country in which you are located, and may be treated in accordance with the
>> laws and regulations of the country of each individual included in the
>> entire chain.
>> Copyright NatWest Markets Plc. All rights reserved. See
>> https://www.nwm.com/disclaimer for further risk disclosure.
>>
>


Re: Registering Protobuf schema

2020-07-12 Thread Kaymak, Tobias
This sounds like it is related to the problem I'm trying to solve. (In my
case having a Java POJO containing a protobuf backed-class and trying to
generate a Beam Schema from it.)

I would be very interested to a solution to this as well :)

On Tue, Jul 7, 2020 at 2:22 PM  wrote:

> Hi All,
>
>
>
> I have a BEAM pipeline where I am reading data from some parquet files and
> converting them into a different format based on protobuf generated classes.
>
>
>
> I wish to associate a schema (derived from the protobuf classes) for my
> PCollections.  What is the appropriate way to do this with
> protobuf-generated classes?
>
>
>
> Code excerpt:
>
>
>
> PCollection result = input.apply("FXFilePattern", FileIO.*match*
> ().filepattern(fxDataFilePattern))
> .apply("FXReadMatches", FileIO.*readMatches*())
> .apply("FXReadParquetFile", ParquetIO.*readFiles*(fxAvroSchema))
> .apply("MapFXToProto", ParDo.*of*(MapFXToProto.*of*()));
> boolean hasSchema = result.hasSchema();  // returns false
>
>
>
> With thanks in advance.
>
>
>
> Kind regards,
>
>
>
> Rob
>
>
>
> *Robert Butcher*
>
> *Technical Architect | Foundry/SRS | NatWest Markets*
>
> WeWork, 10 Devonshire Square, London, EC2M 4AE
>
> Mobile +44 (0) 7414 730866
>
>
>
> This email is classified as *CONFIDENTIAL* unless otherwise stated.
>
>
>
> This communication and any attachments are confidential and intended
> solely for the addressee. If you are not the intended recipient please
> advise us immediately and delete it. Unless specifically stated in the
> message or otherwise indicated, you may not duplicate, redistribute or
> forward this message and any attachments are not intended for distribution
> to, or use by any person or entity in any jurisdiction or country where
> such distribution or use would be contrary to local law or regulation.
> NatWest Markets Plc  or any affiliated entity ("NatWest Markets") accepts
> no responsibility for any changes made to this message after it was sent.
> Unless otherwise specifically indicated, the contents of this
> communication and its attachments are for information purposes only and
> should not be regarded as an offer or solicitation to buy or sell a product
> or service, confirmation of any transaction, a valuation, indicative price
> or an official statement. Trading desks may have a position or interest
> that is inconsistent with any views expressed in this message. In
> evaluating the information contained in this message, you should know that
> it could have been previously provided to other clients and/or internal
> NatWest Markets personnel, who could have already acted on it.
> NatWest Markets cannot provide absolute assurances that all electronic
> communications (sent or received) are secure, error free, not corrupted,
> incomplete or virus free and/or that they will not be lost, mis-delivered,
> destroyed, delayed or intercepted/decrypted by others. Therefore NatWest
> Markets disclaims all liability with regards to electronic communications
> (and the contents therein) if they are corrupted, lost destroyed, delayed,
> incomplete, mis-delivered, intercepted, decrypted or otherwise
> misappropriated by others.
> Any electronic communication that is conducted within or through NatWest
> Markets systems will be subject to being archived, monitored and produced
> to regulators and in litigation in accordance with NatWest Markets’ policy
> and local laws, rules and regulations. Unless expressly prohibited by local
> law, electronic communications may be archived in countries other than the
> country in which you are located, and may be treated in accordance with the
> laws and regulations of the country of each individual included in the
> entire chain.
> Copyright NatWest Markets Plc. All rights reserved. See
> https://www.nwm.com/disclaimer for further risk disclosure.
>


Registering Protobuf schema

2020-07-07 Thread Robert.Butcher
Hi All,

I have a BEAM pipeline where I am reading data from some parquet files and 
converting them into a different format based on protobuf generated classes.

I wish to associate a schema (derived from the protobuf classes) for my 
PCollections.  What is the appropriate way to do this with protobuf-generated 
classes?

Code excerpt:

PCollection result = input.apply("FXFilePattern", 
FileIO.match().filepattern(fxDataFilePattern))
.apply("FXReadMatches", FileIO.readMatches())
.apply("FXReadParquetFile", ParquetIO.readFiles(fxAvroSchema))
.apply("MapFXToProto", ParDo.of(MapFXToProto.of()));
boolean hasSchema = result.hasSchema();  // returns false

With thanks in advance.

Kind regards,

Rob

Robert Butcher
Technical Architect | Foundry/SRS | NatWest Markets
WeWork, 10 Devonshire Square, London, EC2M 4AE
Mobile +44 (0) 7414 730866

This email is classified as CONFIDENTIAL unless otherwise stated.



This communication and any attachments are confidential and intended solely for 
the addressee. If you are not the intended recipient please advise us 
immediately and delete it. Unless specifically stated in the message or 
otherwise indicated, you may not duplicate, redistribute or forward this 
message and any attachments are not intended for distribution to, or use by any 
person or entity in any jurisdiction or country where such distribution or use 
would be contrary to local law or regulation. NatWest Markets Plc  or any 
affiliated entity ("NatWest Markets") accepts no responsibility for any changes 
made to this message after it was sent.

Unless otherwise specifically indicated, the contents of this communication and 
its attachments are for information purposes only and should not be regarded as 
an offer or solicitation to buy or sell a product or service, confirmation of 
any transaction, a valuation, indicative price or an official statement. 
Trading desks may have a position or interest that is inconsistent with any 
views expressed in this message. In evaluating the information contained in 
this message, you should know that it could have been previously provided to 
other clients and/or internal NatWest Markets personnel, who could have already 
acted on it.

NatWest Markets cannot provide absolute assurances that all electronic 
communications (sent or received) are secure, error free, not corrupted, 
incomplete or virus free and/or that they will not be lost, mis-delivered, 
destroyed, delayed or intercepted/decrypted by others. Therefore NatWest 
Markets disclaims all liability with regards to electronic communications (and 
the contents therein) if they are corrupted, lost destroyed, delayed, 
incomplete, mis-delivered, intercepted, decrypted or otherwise misappropriated 
by others.

Any electronic communication that is conducted within or through NatWest 
Markets systems will be subject to being archived, monitored and produced to 
regulators and in litigation in accordance with NatWest Markets’ policy and 
local laws, rules and regulations. Unless expressly prohibited by local law, 
electronic communications may be archived in countries other than the country 
in which you are located, and may be treated in accordance with the laws and 
regulations of the country of each individual included in the entire chain.

Copyright NatWest Markets Plc. All rights reserved. See 
https://www.nwm.com/disclaimer for further risk disclosure.