We have a similiar question/issue at my work. 2 solutions come to mind:

1. Wrap your inputs, transforms, etc. in functions that you can call and
the chain together

2. Use external libraries that a ParDo class can call. Then you can make
these external libraries flexible and testable.

On Sat, Oct 12, 2024, 12:31 PM Joey Tran <joey.t...@schrodinger.com> wrote:

> Yes. But this is a hypothetical, there could also be many operations you
> might want to do with the initial data.
>
> On Sat, Oct 12, 2024, 1:47 PM Henry Tremblay <paulhtremb...@gmail.com>
> wrote:
>
>> So the only part of the pipeline you need to change is the
>> transformation in the middle, after the read for the DB and before some
>> type of write?
>>
>> On Sat, Oct 12, 2024 at 3:29 AM <trs...@gmail.com> wrote:
>>
>>> Sounds like you want a monad, heh.
>>>
>>> It would be nice if their DoFn took a generic type and you could pass it
>>> a selector func to pick out what they need.
>>> If you can access their dofn is not too complex, perhaps you just use
>>> their processElement implementation directly?
>>>
>>> eg
>>>
>>> class TheirDoFn ..{ void processElement(...){...} }
>>>
>>> class YourDoFn .. {
>>>   void processElement() {
>>>     TheirDoFn().processElement(...)
>>>   }
>>> }
>>>
>>> Depending on what annotations they're using in their processElement
>>> func, it could be trickier or not. You could pass in a mock implementation
>>> OutputReceiver, so you can wrap the results and delegate.
>>>
>>> On Sat, 12 Oct 2024 at 08:51, XQ Hu via user <user@beam.apache.org>
>>> wrote:
>>>
>>>> This sounds like what CDC (Change Data Capture) typically does, which
>>>> usually runs as a streaming pipeline.
>>>>
>>>> On Fri, Oct 11, 2024 at 3:51 PM Joey Tran <joey.t...@schrodinger.com>
>>>> wrote:
>>>>
>>>>> Another basic pattern question for the user group.
>>>>>
>>>>> Say I have a database of records with an ID and some float property.
>>>>> Another team has written and published a transform `SquareRoot`. I want to
>>>>> write a pipeline that reads this database and outputs extended records 
>>>>> that
>>>>> have (ID, foo_prop, squareroot(foo)_prop). How do I do this?
>>>>>
>>>>> Of course I can strip my records of their ID and then pass in the
>>>>> properties straight into `SquareRoot`, but then I have no way to link it
>>>>> back to what record the square root corresponds to. Do I just need to ask
>>>>> the other team to make their SquareRootDoFn public? Should they have
>>>>> included a `SquareRoot.WithKey()` transform that ignores a key?
>>>>>
>>>>> This feels like it'd be a common pattern but how to approach it feels
>>>>> awkward, not sure if I'm missing something obvious so thought I'd ask the
>>>>> group.
>>>>>
>>>>> Cheers,
>>>>> Joey
>>>>>
>>>>> --
>>>>>
>>>>> Joey Tran | Staff Developer | AutoDesigner TL
>>>>>
>>>>> *he/him*
>>>>>
>>>>> [image: Schrödinger, Inc.] <https://schrodinger.com/>
>>>>>
>>>>
>>
>> --
>> Henry Tremblay
>> Data Engineer, Best Buy
>>
>

Reply via email to