I've got two nodes currently running drill, so I ran it on both, before and after restarting.
Node mentioned previously: Before: 396 After: 396 Other node: Before: 14 After: 395 On Monday, January 11, 2016, Hanifi GUNES <[email protected]> wrote: > Can you run the following on a clean Drillbit(after restart) to report > number of open files? Please run this before & after executing your > queries. > > 1: lsof -a -p DRILL_PID | wc -l > > 2016-01-11 14:13 GMT-08:00 Ian Maloney <[email protected] > <javascript:;>>: > > > I'm don't know a lot about checking the limits, but I'm hive (on hdfs). > Im > > able to perform a count on a hive table, so I don't think the issue is > > there. The hive client and drillbit sate on different nodes though. I > did a > > ulimit -a and got: > > > > core file size (blocks, -c) 0 > > data seg size (kbytes, -d) unlimited > > scheduling priority (-e) 0 > > file size (blocks, -f) unlimited > > pending signals (-i) 31403 > > max locked memory (kbytes, -l) 64 > > max memory size (kbytes, -m) unlimited > > open files (-n) 1024 > > pipe size (512 bytes, -p) 8 > > POSIX message queues (bytes, -q) 819200 > > real-time priority (-r) 0 > > stack size (kbytes, -s) 10240 > > cpu time (seconds, -t) unlimited > > max user processes (-u) 1024 > > virtual memory (kbytes, -v) unlimited > > file locks (-x) unlimited > > > > > > If you have any steps you'd like me to take to help debug, I'd be happy > to > > attempt them. > > > > > > > > On Monday, January 11, 2016, Hanifi GUNES <[email protected] > <javascript:;>> wrote: > > > > > Interesting. Can you check your procfs to confirm Drill opens excessive > > > number of file descriptors after making sure that query is complete > > through > > > Web UI? If so, we likely have some leak. Also, can you confirm soft & > > hard > > > limits for open file descriptors are not set too low on your platform? > > > > > > -Hanifi > > > > > > 2016-01-11 13:13 GMT-08:00 Ian Maloney <[email protected] > <javascript:;> > > > <javascript:;>>: > > > > > > > Hanifi, > > > > > > > > Thanks for the suggestions, I'm not using any concurrency at the > > moment. > > > In > > > > the web UI under profiles I see "No running queries", the error is > > still > > > > happening, even after restarting the bit. > > > > > > > > > > > > > > > > On Monday, January 11, 2016, Hanifi GUNES <[email protected] > <javascript:;> > > > <javascript:;>> wrote: > > > > > > > > > -* Any ideas on how to **keep the (suspected) resource leak from > > > > > happening?* > > > > > > > > > > The answer to this depends on your workload as well. You have > > mentioned > > > > you > > > > > are running a lot of queries so Drill might ordinarily open too > many > > > > > descriptors to serve the high demand. Of course, this assumes that > > you > > > do > > > > > not limit the level of concurrency. If so, why don't you try > enabling > > > > > queues to bound concurrent execution and run your workload again? > [1] > > > > > > > > > > You can always open up web UI to see if your queries are completed > or > > > > > hanging around. If you see too many queries pending and stuck for a > > > long > > > > > time, it is a good indicator of the problem I described above [2] > > > > > > > > > > > > > > > -Hanifi > > > > > > > > > > 1: https://drill.apache.org/docs/enabling-query-queuing/ > > > > > 2: https://drill.apache.org/docs/query-profiles/ > > > > > > > > > > 2016-01-11 10:35 GMT-08:00 Ian Maloney < > [email protected] <javascript:;> > > > <javascript:;> > > > > > <javascript:;>>: > > > > > > > > > > > Hi Abdel, > > > > > > > > > > > > I just noticed I'm still on v. 1.2, so maybe upgrading will > help. I > > > > could > > > > > > open a Jira, if need be. > > > > > > > > > > > > As far as restarting and reproducing, I restarted and after > running > > > my > > > > > app > > > > > > with the jdbc logic for a while, I get an IOException: Error > > > accessing > > > > / > > > > > > I paused the app and restarted the specific bit from the jdbc > > > > connection > > > > > > below, which gave me the "Too many open files" exception again. > > > > > > > > > > > > Now, restarting that bit won't fix the error. > > > > > > > > > > > > > > > > > > On Monday, January 11, 2016, Abdel Hakim Deneche < > > > > [email protected] <javascript:;> <javascript:;> > > > > > <javascript:;>> > > > > > > wrote: > > > > > > > > > > > > > Hi Ian, > > > > > > > > > > > > > > Can you open up a JIRA for this ? is it easy to reproduce ? > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > On Mon, Jan 11, 2016 at 8:59 AM, Ian Maloney < > > > > > > [email protected] <javascript:;> <javascript:;> > <javascript:;> > > > > > > > <javascript:;>> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > I've been running a lot of queries via jdbc/drill. I have > four > > > > > > drillbits, > > > > > > > > but I could not get the zk jdbc URL to work so I used: > > > > > > > > jdbc:drill:drillbit=a-bits-hostname > > > > > > > > > > > > > > > > Now I get a SocketException for too many open files, even > when > > > > > > accessing > > > > > > > > via cli. I imagine I could restart the bits, but for > something > > > > > intended > > > > > > > for > > > > > > > > production, that doesn't seem like a viable solution. Any > ideas > > > on > > > > > how > > > > > > to > > > > > > > > keep the (suspected) resource leak from happening? > > > > > > > > > > > > > > > > I'm closing ResultSet, Statement, and Connection, after each > > > query. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > > > > > Software Engineer > > > > > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
