>
> Impala has two sets of information tracked on the coordinator node for
> each query: a summary and a profile.
> The profile is currently accessible as a string, which is unwieldy for
> parsing. A thrift format is theoretically available, but there is a bug:
> https://issues.apache.org/jira/browse/IMPALA-8252 , which is resolved in
> v3.2.0. So you need to have version >=3.2


The thrift format generally works fine, I know of a lot of tooling built on
top of it (e.g. Cloudera Manager uses it extensively). The title of the
JIRA sounds overly dramatic without context, basically we had some issues
with compatibility across versions. You'll be fine if you use the .thrift
file corresponding to the version of Impala you're consuming profiles from.
It's messier if you have a tool that uses an old thrift file, since there
were some issues with backward compatibility, or if you're trying to
consume profiles from multiple versions of Impala.

There's a toy Python profile decoder in the impala source tree that may be
useful to get started -
https://github.com/apache/impala/blob/master/bin/parse-thrift-profile.py
 and
https://github.com/apache/impala/blob/24eab713a0d35f629509f59711f8a563e1346acf/lib/python/impala_py_lib/profiles.py
.
That just gets you from the base64-encoded strings to a thrift object.

A JSON format was added very recently (this week) into master -
https://gerrit.cloudera.org/#/c/13801/. That's kinda experimental at the
moment - we're not sure how convenient the current structure is without
some experience actually using it - we'd welcome feedback about your use
cases.

- Tim



On Fri, Aug 9, 2019 at 4:14 PM Antoni Ivanov <aiva...@vmware.com> wrote:

> Hi,
>
>
>
> We did some research on the topic, the answer we’ve come so far is
>
>
>
> Impala has two sets of information tracked on the coordinator node for
> each query: a summary and a profile.
>
> The profile is currently accessible as a string, which is unwieldy for
> parsing. A thrift format is theoretically available, but there is a bug:
> https://issues.apache.org/jira/browse/IMPALA-8252 , which is resolved in
> v3.2.0. So you need to have version >=3.2
>
>
>
>
>
> After that Thrift Encoding form Twitter commons may be used –
>
>
> https://github.com/twitter/commons/blob/06905dc0f1a26440a79ff1164831c85ce2d1bdf0/src/python/twitter/thrift/text/thrift_json_encoder.py
>
>
>
>
>
> The thrift can be downloaded from Coordinator node e.g
> http://coord-node:25000/query_profile_encoded?query_id=442c057197d9c0d:81810ccd00000000
> ( 442c057197d9c0d:81810ccd00000000 is the Query ID)
>
> The thrift can be downloaded from Cloudera REST API (if using Cloudera)
> Or if using impyla <https://github.com/cloudera/impyla> Python library
> you can get the profile after execution
>
>         cur.execute(sql)
>
>         return cur.get_profile(profile_format=TRuntimeProfileFormat.THRIFT)
>
>
>
>
>
> Just posting here in  case it’s helpful to anyone following the user
> group.
>
>
>
> -Antoni
>
>
>
> *From:* Antoni Ivanov
> *Sent:* Wednesday, August 7, 2019 10:13 AM
> *To:* user@impala.apache.org
> *Cc:* dev@impala <d...@impala.apache.org>; Jenny Kwan (c) <
> kje...@vmware.com>
> *Subject:* How to parse a query plan /summary/profile
>
>
>
> Hi,
>
>
>
> We’d like to get better visibility into way our Impala Cluster is used.
>
> For example there’s per node utilization – e.g sometimes fragments on a
> given node are slower, and this is visible in profile . Or there are some
> statistics available only in profile (like Runtime filters used or parquet
> file pruning stats)
>
>
>
> I think you can download it as a Thrift ? But is it easily de-serializable
> (we need to have the Thrift Schema at least I think)
>
> Thanks,
>
> Antoni
>
>
>

Reply via email to