Hi Gopal, Thanks for the help. Can you please also let me know what argument list this script want .
I was trying following in HDP Sandbox , but did not get JSON outout [root@sandbox ~]# python ats-plan-fetcher.py --ats=http://127.0.0.1:8188/ws/v1/timeline/HIVE_QUERY_ID? --count=1 ats-plan-fetcher.py:9: DeprecationWarning: the md5 module is deprecated; use hashlib instead from md5 import md5 as md5_hash Starting to fetch from ATS URL = http://127.0.0.1:8188/ws/v1/timeline/HIVE_QUERY_ID? Fetching /ws/v1/timeline/HIVE_QUERY_ID?&limit=25 But I can see a lot of entries at http://127.0.0.1:8188/ws/v1/timeline/HIVE_QUERY_ID? Thanks Rajit On 11/19/15, 6:29 PM, "Gopal Vijayaraghavan" <go...@hortonworks.com on behalf of gop...@apache.org> wrote: >> We would like to capture some information in our Hadoop Cluster. >> Can anybody please suggest how we can we achieve this, any tools >>available already ? Or do we need to scrub any log ? > >Apache Atlas is the standardized solution for deeper analytics into data >ownership/usage (look at the HiveHook in Atlas). > >> 1. We want to know how many queries are run in everyday >> 2. What are the durations of those queries . >> 3. If any queries are failing in what step they are failing. > >For a general use-case, you probably are already writing a lot of this >data already. > >https://gist.github.com/t3rmin4t0r/e4bf835f10271b9e466e > >That only pulls the query text + plans in JSON (to automatically look for >bad plans), but the total event structure looks like this > >{ > "domain": "DEFAULT", > "entity": >"gopal_20151119211930_bae04691-f46a-44c4-9116-bef8f854e49a", > "entitytype": "HIVE_QUERY_ID", > "events": [ > { > "eventinfo": {}, > "eventtype": "QUERY_COMPLETED", > "timestamp": 1447986004954 > }, > { > "eventinfo": {}, > "eventtype": "QUERY_SUBMITTED", > "timestamp": 1447985970564 > } > ], > "otherinfo": { > "STATUS": true, > "TEZ": true > > "MAPRED": false, > > "QUERY" : ... > } > "primaryfilters": { > "requestuser": [ > "gopal" > ], > "user": [ > "gopal" > ] > }, > >} > >I have seen at least one custom KafkaHook to feed hive query plans into a >Storm pipeline, but that was custom built to police the system after an >ad-hoc query produced a 4.5 petabyte join. > >Cheers, >Gopal > > ________________________________ DISCLAIMER: The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, re-transmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this document and any attachments.