Can you run the following on a clean Drillbit(after restart) to report number of open files? Please run this before & after executing your queries.
1: lsof -a -p DRILL_PID | wc -l 2016-01-11 14:13 GMT-08:00 Ian Maloney <[email protected]>: > I'm don't know a lot about checking the limits, but I'm hive (on hdfs). Im > able to perform a count on a hive table, so I don't think the issue is > there. The hive client and drillbit sate on different nodes though. I did a > ulimit -a and got: > > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 31403 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 1024 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > > If you have any steps you'd like me to take to help debug, I'd be happy to > attempt them. > > > > On Monday, January 11, 2016, Hanifi GUNES <[email protected]> wrote: > > > Interesting. Can you check your procfs to confirm Drill opens excessive > > number of file descriptors after making sure that query is complete > through > > Web UI? If so, we likely have some leak. Also, can you confirm soft & > hard > > limits for open file descriptors are not set too low on your platform? > > > > -Hanifi > > > > 2016-01-11 13:13 GMT-08:00 Ian Maloney <[email protected] > > <javascript:;>>: > > > > > Hanifi, > > > > > > Thanks for the suggestions, I'm not using any concurrency at the > moment. > > In > > > the web UI under profiles I see "No running queries", the error is > still > > > happening, even after restarting the bit. > > > > > > > > > > > > On Monday, January 11, 2016, Hanifi GUNES <[email protected] > > <javascript:;>> wrote: > > > > > > > -* Any ideas on how to **keep the (suspected) resource leak from > > > > happening?* > > > > > > > > The answer to this depends on your workload as well. You have > mentioned > > > you > > > > are running a lot of queries so Drill might ordinarily open too many > > > > descriptors to serve the high demand. Of course, this assumes that > you > > do > > > > not limit the level of concurrency. If so, why don't you try enabling > > > > queues to bound concurrent execution and run your workload again? [1] > > > > > > > > You can always open up web UI to see if your queries are completed or > > > > hanging around. If you see too many queries pending and stuck for a > > long > > > > time, it is a good indicator of the problem I described above [2] > > > > > > > > > > > > -Hanifi > > > > > > > > 1: https://drill.apache.org/docs/enabling-query-queuing/ > > > > 2: https://drill.apache.org/docs/query-profiles/ > > > > > > > > 2016-01-11 10:35 GMT-08:00 Ian Maloney <[email protected] > > <javascript:;> > > > > <javascript:;>>: > > > > > > > > > Hi Abdel, > > > > > > > > > > I just noticed I'm still on v. 1.2, so maybe upgrading will help. I > > > could > > > > > open a Jira, if need be. > > > > > > > > > > As far as restarting and reproducing, I restarted and after running > > my > > > > app > > > > > with the jdbc logic for a while, I get an IOException: Error > > accessing > > > / > > > > > I paused the app and restarted the specific bit from the jdbc > > > connection > > > > > below, which gave me the "Too many open files" exception again. > > > > > > > > > > Now, restarting that bit won't fix the error. > > > > > > > > > > > > > > > On Monday, January 11, 2016, Abdel Hakim Deneche < > > > [email protected] <javascript:;> > > > > <javascript:;>> > > > > > wrote: > > > > > > > > > > > Hi Ian, > > > > > > > > > > > > Can you open up a JIRA for this ? is it easy to reproduce ? > > > > > > > > > > > > Thanks > > > > > > > > > > > > On Mon, Jan 11, 2016 at 8:59 AM, Ian Maloney < > > > > > [email protected] <javascript:;> <javascript:;> > > > > > > <javascript:;>> > > > > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I've been running a lot of queries via jdbc/drill. I have four > > > > > drillbits, > > > > > > > but I could not get the zk jdbc URL to work so I used: > > > > > > > jdbc:drill:drillbit=a-bits-hostname > > > > > > > > > > > > > > Now I get a SocketException for too many open files, even when > > > > > accessing > > > > > > > via cli. I imagine I could restart the bits, but for something > > > > intended > > > > > > for > > > > > > > production, that doesn't seem like a viable solution. Any ideas > > on > > > > how > > > > > to > > > > > > > keep the (suspected) resource leak from happening? > > > > > > > > > > > > > > I'm closing ResultSet, Statement, and Connection, after each > > query. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > > > Software Engineer > > > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > > < > > > > > > > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > > >
