Worth noting that you never "lose" a PCollection. You can use the same PCollection in as many transforms as you like and every time you reference that PCollection<A> it will be in the same state it was when you first read it in.
So if you have: PCollection<A> colA = ...; PCollection<RedisData> = colA.apply(ParDo.of(new ReadRedisDataDoFn()); You have not consumed the colA PCollection and can reference/use it as many times as you want in further steps. My instinct for this is: 1. Read Source to get PCollection<A> 2. Pull the key to look up in Redis from Pcollection<A> into another PCollection 3. Look up with a custom DoFn if the normal IO one doesn't meet your needs 4. CoGroupByKey transform to group them together 5. Do Whatever else you need to do with the combined data. ________________________________ From: Vincent Marquez <[email protected]> Sent: Wednesday, July 21, 2021 12:14 PM To: user <[email protected]> Subject: Mapping *part* of a PCollection possible? (Lens optics for PCollection?) Let's say I have PCollection<A> and I want to use the 'readAll' pattern to enhance some data from an additional source such as redis (which has a readKeys PTransform<String, RedisData>). However I don't want to 'lose' the original A. There *are* a few ways to do this currently (side inputs, joining two streams with CoGroupByKey, using State) all of which have some problems. If I could map PCollection<A> into some type that has <A, String> for instance PCollection<KV<A, String>>, then use the redis readKeys to map to PCollection<KV<A, RedisData>> this solves all my problems. This is more or less a get/set lens optic if anyone is familiar with functional programming. Is something like this possible? Could it be added? I've run into wanting this pattern numerous times over the last year. ~Vincent evolve24 Confidential & Proprietary Statement: This email and any attachments are confidential and may contain information that is privileged, confidential or exempt from disclosure under applicable law. It is intended for the use of the recipients. If you are not the intended recipient, or believe that you have received this communication in error, please do not read, print, copy, retransmit, disseminate, or otherwise use the information. Please delete this email and attachments, without reading, printing, copying, forwarding or saving them, and notify the Sender immediately by reply email. No confidentiality or privilege is waived or lost by any transmission in error.
