Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
I'll try to run with a higher caching to see how that changes things, thanks On Mon, May 23, 2016 at 4:07 PM Stack wrote: > How hard to change the below if only temporarily (Trying to get a datapoint > or two to act on; the short circuit code hasn't changed that we know of...

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Stack
How hard to change the below if only temporarily (Trying to get a datapoint or two to act on; the short circuit code hasn't changed that we know of... perhaps the scan chunking facility in 1.1 has some side effect we've not noticed up to this). If you up the caching to be bigger does it lower the

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
For reference, the Scan backing the job is pretty basic: Scan scan = new Scan(); scan.setCaching(500); // probably too small for the datasize we're dealing with scan.setCacheBlocks(false); scan.setScanMetricsEnabled(true); scan.setMaxVersions(1); scan.setTimeRange(startTime, stopTime); Otherwise

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
I've forced the issue to happen again. netstat takes a while to run on this host while it's happening, but I do not see an abnormal amount of CLOSE_WAIT (compared to other hosts). I forced more than usual number of regions for the affected table onto the host to speed up the process. File

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Stack
On Mon, May 23, 2016 at 11:19 AM, Bryan Beaudreault < bbeaudrea...@hubspot.com> wrote: > We run MR against many tables in all of our clusters, they mostly have > similar schema definitions though vary in terms of key length, # columns, > etc. This is the only cluster and only table we've seen

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
We run MR against many tables in all of our clusters, they mostly have similar schema definitions though vary in terms of key length, # columns, etc. This is the only cluster and only table we've seen leak so far. It's probably the table with the biggest regions which we MR against, though it's

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Stack
On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault wrote: > Hey everyone, > > We are noticing a file descriptor leak that is only affecting nodes in our > cluster running 5.7.0, not those still running 5.3.8. Translation: roughly hbase-1.2.0+hadoop-2.6.0 vs

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
Thanks Ted, I was not familiar with that JIRA, though I have read it now. The next time it happens I will run the test at the top: netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l However, from what I can tell in the JIRA this should affect almost all versions of HBase above 0.94. The issue we

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Ted Yu
Have you taken a look at HBASE-9393 ? On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault wrote: > Hey everyone, > > We are noticing a file descriptor leak that is only affecting nodes in our > cluster running 5.7.0, not those still running 5.3.8. I ran an lsof against

File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
Hey everyone, We are noticing a file descriptor leak that is only affecting nodes in our cluster running 5.7.0, not those still running 5.3.8. I ran an lsof against an affected regionserver, and noticed that there were 10k+ unix sockets that are just called "socket", as well as another 10k+ of