I would like to see an example. From: Joey Tran <joey.t...@schrodinger.com> Sent: Tuesday, October 15, 2024 11:09 AM To: user@beam.apache.org Subject: Re: Transform Pattern Question
You don't often get email from joey.t...@schrodinger.com<mailto:joey.t...@schrodinger.com>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Thinking about it the past few days, I think I've arrived at the conclusion that generally shared transforms should also expose their dofn classes to make accommodating this kind of pattern easier. Then with a utility decorator/class that takes a dofn, we can just modify the wrapped dofn to operate on `KV`s and leave keys alone. It's not the most ergonomic pattern imo since it requires more consideration of PTransforms vs DoFns and which abstraction level is right for your needs, and also knowing about this `Keyed[DoFn]` decorator, but it seems unavoidable. On Sat, Oct 12, 2024 at 4:38 PM Henry Tremblay <paulhtremb...@gmail.com<mailto:paulhtremb...@gmail.com>> wrote: We have a similiar question/issue at my work. 2 solutions come to mind: 1. Wrap your inputs, transforms, etc. in functions that you can call and the chain together 2. Use external libraries that a ParDo class can call. Then you can make these external libraries flexible and testable. On Sat, Oct 12, 2024, 12:31 PM Joey Tran <joey.t...@schrodinger.com<mailto:joey.t...@schrodinger.com>> wrote: Yes. But this is a hypothetical, there could also be many operations you might want to do with the initial data. On Sat, Oct 12, 2024, 1:47 PM Henry Tremblay <paulhtremb...@gmail.com<mailto:paulhtremb...@gmail.com>> wrote: So the only part of the pipeline you need to change is the transformation in the middle, after the read for the DB and before some type of write? On Sat, Oct 12, 2024 at 3:29 AM <trs...@gmail.com<mailto:trs...@gmail.com>> wrote: Sounds like you want a monad, heh. It would be nice if their DoFn took a generic type and you could pass it a selector func to pick out what they need. If you can access their dofn is not too complex, perhaps you just use their processElement implementation directly? eg class TheirDoFn ..{ void processElement(...){...} } class YourDoFn .. { void processElement() { TheirDoFn().processElement(...) } } Depending on what annotations they're using in their processElement func, it could be trickier or not. You could pass in a mock implementation OutputReceiver, so you can wrap the results and delegate. On Sat, 12 Oct 2024 at 08:51, XQ Hu via user <user@beam.apache.org<mailto:user@beam.apache.org>> wrote: This sounds like what CDC (Change Data Capture) typically does, which usually runs as a streaming pipeline. On Fri, Oct 11, 2024 at 3:51 PM Joey Tran <joey.t...@schrodinger.com<mailto:joey.t...@schrodinger.com>> wrote: Another basic pattern question for the user group. Say I have a database of records with an ID and some float property. Another team has written and published a transform `SquareRoot`. I want to write a pipeline that reads this database and outputs extended records that have (ID, foo_prop, squareroot(foo)_prop). How do I do this? Of course I can strip my records of their ID and then pass in the properties straight into `SquareRoot`, but then I have no way to link it back to what record the square root corresponds to. Do I just need to ask the other team to make their SquareRootDoFn public? Should they have included a `SquareRoot.WithKey()` transform that ignores a key? This feels like it'd be a common pattern but how to approach it feels awkward, not sure if I'm missing something obvious so thought I'd ask the group. Cheers, Joey -- Joey Tran | Staff Developer | AutoDesigner TL he/him [Schrödinger, Inc.]<https://schrodinger.com/> -- Henry Tremblay Data Engineer, Best Buy