Nested fields are not part of standard SQL AFAIK. Beam goes further and supports array of array, etc.
On Wed, Jan 13, 2021 at 11:42 AM Kenneth Knowles <[email protected]> wrote: > Just the fields specified, IMO. When in doubt, copy SQL. (and I mean SQL > generally, not just Beam SQL) > > Kenn > > On Wed, Jan 13, 2021 at 11:17 AM Reuven Lax <[email protected]> wrote: > >> Definitely could be a top-level transform. Should it automatically unnest >> all arrays, or just the fields specified? >> >> We do have to define the semantics for nested arrays as well. >> >> On Wed, Jan 13, 2021 at 10:57 AM Robert Bradshaw <[email protected]> >> wrote: >> >>> Ah, thanks for the clarification. UNNEST does sound like what you want >>> here, and would likely make sense as a top-level relational transform as >>> well as being supported by SQL. >>> >>> On Wed, Jan 13, 2021 at 10:53 AM Tao Li <[email protected]> wrote: >>> >>>> @Kyle Weaver <[email protected]> sure thing! So the input/output >>>> definition for the Flatten.Iterables >>>> <https://beam.apache.org/releases/javadoc/2.25.0/org/apache/beam/sdk/transforms/Flatten.Iterables.html> >>>> is: >>>> >>>> >>>> >>>> Input: PCollection<Iterable<T> >>>> >>>> Output: PCollection<T> >>>> >>>> >>>> >>>> The input/output for a explode transform would look like this: >>>> >>>> Input: PCollection<Row> The row schema has a field which is an array >>>> of T >>>> >>>> Output: PCollection<Row> The array type field from input schema is >>>> replaced with a new field of type T. The elements from the array type field >>>> are flattened into multiple rows in the new table (other fields of input >>>> table are just duplicated. >>>> >>>> >>>> >>>> Hope this clarification helps! >>>> >>>> >>>> >>>> *From: *Kyle Weaver <[email protected]> >>>> *Reply-To: *"[email protected]" <[email protected]> >>>> *Date: *Tuesday, January 12, 2021 at 4:58 PM >>>> *To: *"[email protected]" <[email protected]> >>>> *Cc: *Reuven Lax <[email protected]> >>>> *Subject: *Re: Is there an array explode function/transform? >>>> >>>> >>>> >>>> @Reuven Lax <[email protected]> yes I am aware of that transform, but >>>> that’s different from the explode operation I was referring to: >>>> https://spark.apache.org/docs/latest/api/sql/index.html#explode >>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191408293%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IjXWhmHTGsbpgbxa1gJ5LcOFI%2BoiGIDYBwXPnukQfxk%3D&reserved=0> >>>> >>>> >>>> >>>> How is it different? It'd help if you could provide the signature >>>> (input and output PCollection types) of the transform you have in mind. >>>> >>>> >>>> >>>> On Tue, Jan 12, 2021 at 4:49 PM Tao Li <[email protected]> wrote: >>>> >>>> @Reuven Lax <[email protected]> yes I am aware of that transform, but >>>> that’s different from the explode operation I was referring to: >>>> https://spark.apache.org/docs/latest/api/sql/index.html#explode >>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191418249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XuUUmNB3fgBasjDj0Dq1Z2g6%2Bc5fbvluf%2BnAp2m8cuE%3D&reserved=0> >>>> >>>> >>>> >>>> *From: *Reuven Lax <[email protected]> >>>> *Reply-To: *"[email protected]" <[email protected]> >>>> *Date: *Tuesday, January 12, 2021 at 2:04 PM >>>> *To: *user <[email protected]> >>>> *Subject: *Re: Is there an array explode function/transform? >>>> >>>> >>>> >>>> Have you tried Flatten.iterables >>>> >>>> >>>> >>>> On Tue, Jan 12, 2021, 2:02 PM Tao Li <[email protected]> wrote: >>>> >>>> Hi community, >>>> >>>> >>>> >>>> Is there a beam function to explode an array (similarly to spark sql’s >>>> explode())? I did some research but did not find anything. >>>> >>>> >>>> >>>> BTW I think we can potentially use FlatMap to implement the explode >>>> functionality, but a Beam provided function would be very handy. >>>> >>>> >>>> >>>> Thanks a lot! >>>> >>>>
