We have a similiar question/issue at my work. 2 solutions come to mind: 1. Wrap your inputs, transforms, etc. in functions that you can call and the chain together
2. Use external libraries that a ParDo class can call. Then you can make these external libraries flexible and testable. On Sat, Oct 12, 2024, 12:31 PM Joey Tran <joey.t...@schrodinger.com> wrote: > Yes. But this is a hypothetical, there could also be many operations you > might want to do with the initial data. > > On Sat, Oct 12, 2024, 1:47 PM Henry Tremblay <paulhtremb...@gmail.com> > wrote: > >> So the only part of the pipeline you need to change is the >> transformation in the middle, after the read for the DB and before some >> type of write? >> >> On Sat, Oct 12, 2024 at 3:29 AM <trs...@gmail.com> wrote: >> >>> Sounds like you want a monad, heh. >>> >>> It would be nice if their DoFn took a generic type and you could pass it >>> a selector func to pick out what they need. >>> If you can access their dofn is not too complex, perhaps you just use >>> their processElement implementation directly? >>> >>> eg >>> >>> class TheirDoFn ..{ void processElement(...){...} } >>> >>> class YourDoFn .. { >>> void processElement() { >>> TheirDoFn().processElement(...) >>> } >>> } >>> >>> Depending on what annotations they're using in their processElement >>> func, it could be trickier or not. You could pass in a mock implementation >>> OutputReceiver, so you can wrap the results and delegate. >>> >>> On Sat, 12 Oct 2024 at 08:51, XQ Hu via user <user@beam.apache.org> >>> wrote: >>> >>>> This sounds like what CDC (Change Data Capture) typically does, which >>>> usually runs as a streaming pipeline. >>>> >>>> On Fri, Oct 11, 2024 at 3:51 PM Joey Tran <joey.t...@schrodinger.com> >>>> wrote: >>>> >>>>> Another basic pattern question for the user group. >>>>> >>>>> Say I have a database of records with an ID and some float property. >>>>> Another team has written and published a transform `SquareRoot`. I want to >>>>> write a pipeline that reads this database and outputs extended records >>>>> that >>>>> have (ID, foo_prop, squareroot(foo)_prop). How do I do this? >>>>> >>>>> Of course I can strip my records of their ID and then pass in the >>>>> properties straight into `SquareRoot`, but then I have no way to link it >>>>> back to what record the square root corresponds to. Do I just need to ask >>>>> the other team to make their SquareRootDoFn public? Should they have >>>>> included a `SquareRoot.WithKey()` transform that ignores a key? >>>>> >>>>> This feels like it'd be a common pattern but how to approach it feels >>>>> awkward, not sure if I'm missing something obvious so thought I'd ask the >>>>> group. >>>>> >>>>> Cheers, >>>>> Joey >>>>> >>>>> -- >>>>> >>>>> Joey Tran | Staff Developer | AutoDesigner TL >>>>> >>>>> *he/him* >>>>> >>>>> [image: Schrödinger, Inc.] <https://schrodinger.com/> >>>>> >>>> >> >> -- >> Henry Tremblay >> Data Engineer, Best Buy >> >