Re: Updating a Running Dataflow Job

gaurav mishra Sun, 09 Jan 2022 15:47:45 -0800

On Sun, Jan 9, 2022 at 3:36 PM Reuven Lax <[email protected]> wrote:

>
>
> On Sun, Jan 9, 2022 at 3:10 PM gaurav mishra <[email protected]>
> wrote:
>
>> I think I can make it work now. I found a utility method for building my
>> coder from class
>> Something like
>> Class<Data> dataClass = userConfig.getDataClass();
>> Coder<Data> dataCoder =
>> SchemaCoder.of(schemaRegistry.getSchema(dataClass),
>>                 TypeDescriptor.of(dataClass),
>>                 schemaRegistry.getToRowFunction(dataClass),
>>                 schemaRegistry.getFromRowFunction(dataClass));
>>
>
> This will work. Though, did annotating the POJO like I said not work?
>
 No, annotation alone does not work since I am not using concrete classes
in the code where the pipeline is being constructed. <Data> above is a
template variable in the class which is constructing the pipeline.


>
>> On Sun, Jan 9, 2022 at 2:14 PM gaurav mishra <
>> [email protected]> wrote:
>>
>>> removing setCoder call breaks my pipeline.
>>>
>>> No Coder has been manually specified;  you may do so using .setCoder().
>>>
>>>   Inferring a Coder from the CoderRegistry failed: Unable to provide a
>>> Coder for Data.
>>>
>>>   Building a Coder using a registered CoderProvider failed.
>>>
>>> Reason being the code which is building the pipeline is based on Java
>>> Generics. Actual pipeline building code sets a bunch of parameters which
>>> are used to construct the pipeline.
>>> PCollection<Data> stream =
>>> pipeline.apply(userProvidedTransform).get(outputTag).setCoder(userProvidedCoder)
>>> So I guess I will need to provide some more information to the framework
>>> to make the annotation work.
>>>
>>>
>>> On Sun, Jan 9, 2022 at 1:39 PM Reuven Lax <[email protected]> wrote:
>>>
>>>> If you annotate your POJO with @DefaultSchema(JavaFieldSchema.class),
>>>> that will usually automatically set up schema inference (you'll have to
>>>> remove the setCoder call).
>>>>
>>>> On Sun, Jan 9, 2022 at 1:32 PM gaurav mishra <
>>>> [email protected]> wrote:
>>>>
>>>>> How to set up my pipeline to use Beam's schema encoding.
>>>>> In my current code I am doing something like this
>>>>>
>>>>> PCollection<Data> =
>>>>> pipeline.apply(someTransform).get(outputTag).setCoder(AvroCoder.of(Data.class))
>>>>>
>>>>>
>>>>> On Sun, Jan 9, 2022 at 1:16 PM Reuven Lax <[email protected]> wrote:
>>>>>
>>>>>> I don't think we make any guarantees about Avro coder. Can you use
>>>>>> Beam's schema encoding instead?
>>>>>>
>>>>>> On Sun, Jan 9, 2022 at 1:14 PM gaurav mishra <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Is there a way to programmatically check for compatibility? I
>>>>>>> would like to fail my unit tests if incompatible changes are made to 
>>>>>>> Pojo.
>>>>>>>
>>>>>>> On Fri, Jan 7, 2022 at 4:49 PM Luke Cwik <[email protected]> wrote:
>>>>>>>
>>>>>>>> Check the schema of the avro encoding for the POJO before and after
>>>>>>>> the change to ensure that they are compatible as you expect.
>>>>>>>>
>>>>>>>> On Fri, Jan 7, 2022 at 4:12 PM gaurav mishra <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> This is more of a Dataflow question I guess but asking here in
>>>>>>>>> hopes someone has faced a similar problem and can help.
>>>>>>>>> I am trying to use "--update" option to update a running Dataflow
>>>>>>>>> job. I am noticing that compatibility checks fail any time I add a new
>>>>>>>>> field to my data model. Error says
>>>>>>>>>
>>>>>>>>> The Coder or type for step XYZ  has changed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am using a Java Pojo for data.  Avro coder to serialize the model. 
>>>>>>>>> I read somewhere that adding new optional fields to the data should 
>>>>>>>>> work when updating the pipeline.
>>>>>>>>>
>>>>>>>>> I am fine with updating the coder or implementation of the model to 
>>>>>>>>> something which allows me to update the pipeline in cases when I add 
>>>>>>>>> new optional fields to existing model. Any suggestions?
>>>>>>>>>
>>>>>>>>>

Re: Updating a Running Dataflow Job

Reply via email to