Just spotted a typo in the email below, the table structure is:

id : STRING
host_details : ARRAY<STRUCT<key : STRING, value : STRING>>
metrics : ARRAY<STRUCT<key : STRING, value : STRING>>

I.e. value is a STRING, not s STRUCT.

Thanks,

Tom

On Mon, 30 Apr 2018 at 09:13, Tom Davis <mailinglists...@gmail.com> wrote:

> Hi all,
>
>
> I have a table with a lot of logging / metrics data that looks like this:
>
>
> id: STRING
>
> host_details: ARRAY<STRUCT<key: STRING, value: STRUCT>>
>
> metrics: ARRAY<STRUCT<key: STRING, value: STRUCT>>
>
>
> I would like to be able to extract known keys and their values (from each
> of the host_details and metrics arrays-of-struct). Yes, this format sucks
> and could be a map, but technically the keys aren’t unique, although I am
> happy to make that compromise at this point. In reality the table has more
> fields in the struct.
>
>
> In Spark or MR, I could extract all the fields I need and project them to
> columns with only a single pass over the table. I’m struggling to achieve
> the same in Hive, which may be either my SQL ability or that there isn’t a
> UDF available to meet my needs.
>
>
> The most efficient approach I have found so far, uses the collect()
> Brickhouse UDF And two lateral views generated from the inline() UDTF:
>
>
> SELECT
>
> id,
>
> collect(t.key, t..value) AS map_h_m
>
> FROM
>
> (SELECT id, hds.key AS key, hds.value AS value
>
> FROM metrics
>
> LATERAL VIEW inline(metrics.host_details) hds
>
> UNION ALL
>
> SELECT id, ms.key AS key, ms.value AS value
>
> FROM metrics
>
> LATERAL VIEW inline(metrics.metrics)) t
>
> GROUP BY t.id
>
>
> I can create this as a VIEW and then extract columns from the map using
> the standard map_h_m[‘error_count’] notation.
>
>
> Using collect() / GROUP BY seems more efficient than doing a join by id (I
> think it cuts out a shuffle), but still has problems:
>
>    - it’s still scanning the metrics table multiple times
>    - It assumes that keys are unique across both host_details and
>    metrics.
>
>
> Does anyone have any thoughts before I attempt to write my own UDF(s) to
> assist?
>
>
> Many thanks,
>
> Tom
>

Reply via email to