> > Impala has two sets of information tracked on the coordinator node for > each query: a summary and a profile. > The profile is currently accessible as a string, which is unwieldy for > parsing. A thrift format is theoretically available, but there is a bug: > https://issues.apache.org/jira/browse/IMPALA-8252 , which is resolved in > v3.2.0. So you need to have version >=3.2
The thrift format generally works fine, I know of a lot of tooling built on top of it (e.g. Cloudera Manager uses it extensively). The title of the JIRA sounds overly dramatic without context, basically we had some issues with compatibility across versions. You'll be fine if you use the .thrift file corresponding to the version of Impala you're consuming profiles from. It's messier if you have a tool that uses an old thrift file, since there were some issues with backward compatibility, or if you're trying to consume profiles from multiple versions of Impala. There's a toy Python profile decoder in the impala source tree that may be useful to get started - https://github.com/apache/impala/blob/master/bin/parse-thrift-profile.py and https://github.com/apache/impala/blob/24eab713a0d35f629509f59711f8a563e1346acf/lib/python/impala_py_lib/profiles.py . That just gets you from the base64-encoded strings to a thrift object. A JSON format was added very recently (this week) into master - https://gerrit.cloudera.org/#/c/13801/. That's kinda experimental at the moment - we're not sure how convenient the current structure is without some experience actually using it - we'd welcome feedback about your use cases. - Tim On Fri, Aug 9, 2019 at 4:14 PM Antoni Ivanov <aiva...@vmware.com> wrote: > Hi, > > > > We did some research on the topic, the answer we’ve come so far is > > > > Impala has two sets of information tracked on the coordinator node for > each query: a summary and a profile. > > The profile is currently accessible as a string, which is unwieldy for > parsing. A thrift format is theoretically available, but there is a bug: > https://issues.apache.org/jira/browse/IMPALA-8252 , which is resolved in > v3.2.0. So you need to have version >=3.2 > > > > > > After that Thrift Encoding form Twitter commons may be used – > > > https://github.com/twitter/commons/blob/06905dc0f1a26440a79ff1164831c85ce2d1bdf0/src/python/twitter/thrift/text/thrift_json_encoder.py > > > > > > The thrift can be downloaded from Coordinator node e.g > http://coord-node:25000/query_profile_encoded?query_id=442c057197d9c0d:81810ccd00000000 > ( 442c057197d9c0d:81810ccd00000000 is the Query ID) > > The thrift can be downloaded from Cloudera REST API (if using Cloudera) > Or if using impyla <https://github.com/cloudera/impyla> Python library > you can get the profile after execution > > cur.execute(sql) > > return cur.get_profile(profile_format=TRuntimeProfileFormat.THRIFT) > > > > > > Just posting here in case it’s helpful to anyone following the user > group. > > > > -Antoni > > > > *From:* Antoni Ivanov > *Sent:* Wednesday, August 7, 2019 10:13 AM > *To:* user@impala.apache.org > *Cc:* dev@impala <d...@impala.apache.org>; Jenny Kwan (c) < > kje...@vmware.com> > *Subject:* How to parse a query plan /summary/profile > > > > Hi, > > > > We’d like to get better visibility into way our Impala Cluster is used. > > For example there’s per node utilization – e.g sometimes fragments on a > given node are slower, and this is visible in profile . Or there are some > statistics available only in profile (like Runtime filters used or parquet > file pruning stats) > > > > I think you can download it as a Thrift ? But is it easily de-serializable > (we need to have the Thrift Schema at least I think) > > Thanks, > > Antoni > > >