Re: How to capture query log and duration

Rajit Saha Fri, 20 Nov 2015 09:39:36 -0800

Hi Gopal,

Thanks for the help.
Can you please also let me know what argument list this script want .


I was trying following in HDP Sandbox , but did not get JSON outout

[root@sandbox ~]# python ats-plan-fetcher.py 
--ats=http://127.0.0.1:8188/ws/v1/timeline/HIVE_QUERY_ID? --count=1
ats-plan-fetcher.py:9: DeprecationWarning: the md5 module is deprecated; use 
hashlib instead
  from md5 import md5 as md5_hash
Starting to fetch from ATS URL = 
http://127.0.0.1:8188/ws/v1/timeline/HIVE_QUERY_ID?
Fetching /ws/v1/timeline/HIVE_QUERY_ID?&limit=25


But I can see a lot of entries at 
http://127.0.0.1:8188/ws/v1/timeline/HIVE_QUERY_ID?

Thanks
Rajit



On 11/19/15, 6:29 PM, "Gopal Vijayaraghavan" <go...@hortonworks.com on behalf 
of gop...@apache.org> wrote:

>> We would like to capture some information in our Hadoop Cluster.
>> Can anybody please suggest how we can we  achieve this, any tools
>>available already ? Or do we need to scrub any log ?
>
>Apache Atlas is the standardized solution for deeper analytics into data
>ownership/usage (look at the HiveHook in Atlas).
>
>> 1. We want to know how many queries are run in everyday
>> 2. What are the durations of those queries .
>> 3. If any queries are failing in what step they are failing.
>
>For a general use-case, you probably are already writing a lot of this
>data already.
>
>https://gist.github.com/t3rmin4t0r/e4bf835f10271b9e466e
>
>That only pulls the query text + plans in JSON (to automatically look for
>bad plans), but the total event structure looks like this
>
>{
>            "domain": "DEFAULT",
>            "entity":
>"gopal_20151119211930_bae04691-f46a-44c4-9116-bef8f854e49a",
>            "entitytype": "HIVE_QUERY_ID",
>            "events": [
>                {
>                    "eventinfo": {},
>                    "eventtype": "QUERY_COMPLETED",
>                    "timestamp": 1447986004954
>                },
>                {
>                    "eventinfo": {},
>                    "eventtype": "QUERY_SUBMITTED",
>                    "timestamp": 1447985970564
>                }
>            ],
>            "otherinfo": {
>                "STATUS": true,
>                "TEZ": true
>
>                "MAPRED": false,
>
>                "QUERY" : ...
>            }
>            "primaryfilters": {
>                "requestuser": [
>                    "gopal"
>                ],
>                "user": [
>                    "gopal"
>                ]
>            },
>
>}
>
>I have seen at least one custom KafkaHook to feed hive query plans into a
>Storm pipeline, but that was custom built to police the system after an
>ad-hoc query produced a 4.5 petabyte join.
>
>Cheers,
>Gopal
>
>

________________________________
DISCLAIMER: The information transmitted is intended only for the person or 
entity to which it is addressed and may contain confidential and/or privileged 
material. Any review, re-transmission, dissemination or other use of, or taking 
of any action in reliance upon this information by persons or entities other 
than the intended recipient is prohibited. If you received this in error, 
please contact the sender and destroy any copies of this document and any 
attachments.

Re: How to capture query log and duration

Reply via email to