Hi , Thanks Abhishek ! I would like to have a notification of that orphan drillbit process when it gets disconnected from other running drillbits for some reason , definitely not because of the unclean shut down as those drill bits are running for months . I know I can check the logs and kill that orphaned , which what I did in my case, but I would like to have notification for down drillbit.
Thanks, Divya On Fri, 13 Jul 2018 at 04:15, Abhishek Girish <[email protected]> wrote: > Hey Divya, > > It would depend on the situation, afaik. The sys.drillbits table contains a > list of all running drillibits. If one of the Drillbit has issues and > cannot stay connected to the cluster, I would assume it would be > unregistered and may not show up in the output of sys.drillbits. If it's an > intermittent issue and Drillbit process maintains it's heartbeat > connection, it may show up in the output. > > If you take a look at the logs, you might be able to figure out what is > causing the issue. There may be orphan Drillbit processes which may be have > left behind due to a previous unclean shutdown. Can you clean up all > Drillbit processes (using 'ps -ef | grep -i drillbit' and then a kill -9) > on nodes where you suspect issues and restart Drillbits? > > -Abhishek > > On Tue, Jul 10, 2018 at 7:16 PM Divya Gehlot <[email protected]> > wrote: > > > Hi , > > select * from sys.drillbits; > > What does above query shows if drillbits process hangs ? > > > > > > Thanks > > > > On Tue, 10 Jul 2018 at 15:36, Khurram Faraaz <[email protected]> wrote: > > > > > You can run the below query, and look for the *state *column in the > > result > > > of the query. Online drillbits will be marked as ONLINE. > > > > > > select * from sys.drillbits; > > > > > > - Khurram > > > > > > On Tue, Jul 10, 2018 at 12:24 AM, Divya Gehlot < > [email protected]> > > > wrote: > > > > > > > Hi, > > > > I would like to know the best practice to check the Drillbits status > in > > > > cluster mode. > > > > I have encountered the scenario when check Drillbits process running > > fine > > > > and When check in Drll WebUI , some of the Drillbits are down. > > > > When do RCA(root cause analysis) , got to know due to some reason > > > drillbits > > > > process hanged . > > > > For now the alert system which I have implemented now is checking the > > > > > > > > > > > > > drill/bin/drillbit.sh status > > > > > > > > > > > > Is there any other best way to catch the hung Drillbit process? > > > > Appreciate the advise from Drill community users. > > > > > > > > Thanks, > > > > Divya > > > > > > > > > >
