Re: [BEAM-10587] Support Maps in BigQuery #12389

Reza Rokni Sun, 11 Oct 2020 04:15:37 -0700

+1 on configuration parameter for enable this and leave current behaviour
as default.


On Sat, Oct 10, 2020 at 12:35 AM Andrew Pilloud <apill...@google.com> wrote:

> BigQuery has no native support for Map types, but I agree that we should
> be consistent with how other tools import maps into BigQuery. Is this
> something Dataflow templates do? What other tools are there?
>
> Beam ZetaSQL also lacks support for Map types. I like the idea of adding a
> configuration parameter to turn this on and retaining the existing behavior
> by default.
>
> Thanks for sending this to the list!
>
> Andrew
>
> On Fri, Oct 9, 2020 at 7:20 AM Jeff Klukas <jklu...@mozilla.com> wrote:
>
>> It's definitely desirable to be able to get back Map types from BQ, and
>> it's nice that BQ is consistent in representing maps as repeated key/value
>> structs. Inferring maps from that specific structure is preferable to
>> inventing some new naming convention for the fields, which would hinder
>> interoperability with non-Beam applications.
>>
>> Would it be possible to add a configurable parameter called something
>> like withMapsInferred() ? Default behavior would be the status quo, but
>> users could opt in to the behavior of inferring maps based on field names.
>> This would prevent the PR change from potentially breaking existing
>> applications. And it means the least surprising behavior remains the
>> default.
>>
>> On Fri, Oct 9, 2020 at 6:06 AM Worley, Ryan <ryan.wor...@monster.com>
>> wrote:
>>
>>> https://github.com/apache/beam/pull/12389
>>>
>>> Hi everyone, in the above pull request I am attempting to add support
>>> for writing Avro records with maps to a BigQuery table (via Beam Schema).
>>> The write portion is fairly straightforward - we convert the map to an
>>> array of structs with key and value fields (seemingly the closest possible
>>> approximation of a map in BigQuery).  But the read back portion is more
>>> controversial because we simply check if a field is an array of structs
>>> with exactly two fields - key and value - and assume that should be read
>>> into a Schema map field.
>>>
>>> So the possibility exists that an array of structs with key and value
>>> fields, which wasn't originally written from a map, could be unexpectedly
>>> read into a map.  In the PR review I suggested a few options for tagging
>>> the BigQuery field, so that we could know it was written from a Beam Schema
>>> map and should be read back into one, but I'm not very satisfied with any
>>> of the options.
>>>
>>> Andrew Pilloud suggested that I write to this group to get some feedback
>>> on the issue.  Should we be concerned that all arrays of structs with
>>> exactly 'key' and 'value' fields would be read into a Schema map or could
>>> this be considered a feature?  If the former, how would you suggest that we
>>> limit reading into a map only those fields that were originally written
>>> from a map?
>>>
>>> Thanks for any feedback to help bump this PR along!
>>>
>>> NOTICE:
>>>
>>> This message, and any attachments, contain(s) information that may be
>>> confidential or protected by privilege from disclosure and is intended only
>>> for the individual or entity named above. No one else may disclose, copy,
>>> distribute or use the contents of this message for any purpose. Its
>>> unauthorized use, dissemination or duplication is strictly prohibited and
>>> may be unlawful. If you receive this message in error or you otherwise are
>>> not an authorized recipient, please immediately delete the message and any
>>> attachments and notify the sender.
>>>
>>

Re: [BEAM-10587] Support Maps in BigQuery #12389

Reply via email to