+1 on configuration parameter for enable this and leave current behaviour as default.
On Sat, Oct 10, 2020 at 12:35 AM Andrew Pilloud <apill...@google.com> wrote: > BigQuery has no native support for Map types, but I agree that we should > be consistent with how other tools import maps into BigQuery. Is this > something Dataflow templates do? What other tools are there? > > Beam ZetaSQL also lacks support for Map types. I like the idea of adding a > configuration parameter to turn this on and retaining the existing behavior > by default. > > Thanks for sending this to the list! > > Andrew > > On Fri, Oct 9, 2020 at 7:20 AM Jeff Klukas <jklu...@mozilla.com> wrote: > >> It's definitely desirable to be able to get back Map types from BQ, and >> it's nice that BQ is consistent in representing maps as repeated key/value >> structs. Inferring maps from that specific structure is preferable to >> inventing some new naming convention for the fields, which would hinder >> interoperability with non-Beam applications. >> >> Would it be possible to add a configurable parameter called something >> like withMapsInferred() ? Default behavior would be the status quo, but >> users could opt in to the behavior of inferring maps based on field names. >> This would prevent the PR change from potentially breaking existing >> applications. And it means the least surprising behavior remains the >> default. >> >> On Fri, Oct 9, 2020 at 6:06 AM Worley, Ryan <ryan.wor...@monster.com> >> wrote: >> >>> https://github.com/apache/beam/pull/12389 >>> >>> Hi everyone, in the above pull request I am attempting to add support >>> for writing Avro records with maps to a BigQuery table (via Beam Schema). >>> The write portion is fairly straightforward - we convert the map to an >>> array of structs with key and value fields (seemingly the closest possible >>> approximation of a map in BigQuery). But the read back portion is more >>> controversial because we simply check if a field is an array of structs >>> with exactly two fields - key and value - and assume that should be read >>> into a Schema map field. >>> >>> So the possibility exists that an array of structs with key and value >>> fields, which wasn't originally written from a map, could be unexpectedly >>> read into a map. In the PR review I suggested a few options for tagging >>> the BigQuery field, so that we could know it was written from a Beam Schema >>> map and should be read back into one, but I'm not very satisfied with any >>> of the options. >>> >>> Andrew Pilloud suggested that I write to this group to get some feedback >>> on the issue. Should we be concerned that all arrays of structs with >>> exactly 'key' and 'value' fields would be read into a Schema map or could >>> this be considered a feature? If the former, how would you suggest that we >>> limit reading into a map only those fields that were originally written >>> from a map? >>> >>> Thanks for any feedback to help bump this PR along! >>> >>> NOTICE: >>> >>> This message, and any attachments, contain(s) information that may be >>> confidential or protected by privilege from disclosure and is intended only >>> for the individual or entity named above. No one else may disclose, copy, >>> distribute or use the contents of this message for any purpose. Its >>> unauthorized use, dissemination or duplication is strictly prohibited and >>> may be unlawful. If you receive this message in error or you otherwise are >>> not an authorized recipient, please immediately delete the message and any >>> attachments and notify the sender. >>> >>