I've forced the issue to happen again. netstat takes a while to run on this
host while it's happening, but I do not see an abnormal amount of
CLOSE_WAIT (compared to other hosts).

I forced more than usual number of regions for the affected table onto the
host to speed up the process.  File Descriptors are now growing quite
rapidly, about 8-10 per second.

This is what lsof looks like, multiplied by a couple thousand:

COMMAND   PID  USER   FD      TYPE             DEVICE    SIZE/OFF
NODE NAME
java    23180 hbase  DEL    REG               0,16             3848784656
/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_377023711_1_1702253823
java    23180 hbase  DEL    REG               0,16             3847643924
/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_377023711_1_1614925966
java    23180 hbase  DEL    REG               0,16             3847614191
/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_377023711_1_888427288

The only thing that varies is the last int on the end.

> Anything about the job itself that is holding open references or throwing
away files w/o closing them?

The MR job does a TableMapper directly against HBase, which as far as I
know uses the HBase RPC and does not hit HDFS directly at all. Is it
possible that a long running scan (one with many, many next() calls) could
keep some references to HDFS open for the duration of the overall scan?


On Mon, May 23, 2016 at 2:19 PM Bryan Beaudreault <bbeaudrea...@hubspot.com>
wrote:

> We run MR against many tables in all of our clusters, they mostly have
> similar schema definitions though vary in terms of key length, # columns,
> etc. This is the only cluster and only table we've seen leak so far. It's
> probably the table with the biggest regions which we MR against, though
> it's hard to verify that (anyone in engineering can run such a job).
>
> dfs.client.read.shortcircuit.streams.cache.size = 256
>
> Our typical FD amount is around 3000. When this hadoop job runs, that can
> climb up to our limit of over 30k if we don't act -- it is a gradual build
> up over the course of a couple hours. When we move the regions off or kill
> the job, the FDs will gradually go back down at roughly the same pace. It
> forms a graph in the shape of a pyramid.
>
> We don't use CM, we use mostly the default *-site.xml. We haven't
> overridden anything related to this. The configs between CDH5.3.8 and 5.7.0
> are identical for us.
>
> On Mon, May 23, 2016 at 2:03 PM Stack <st...@duboce.net> wrote:
>
>> On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault <
>> bbeaudrea...@hubspot.com
>> > wrote:
>>
>> > Hey everyone,
>> >
>> > We are noticing a file descriptor leak that is only affecting nodes in
>> our
>> > cluster running 5.7.0, not those still running 5.3.8.
>>
>>
>> Translation: roughly hbase-1.2.0+hadoop-2.6.0 vs
>> hbase-0.98.6+hadoop-2.5.0.
>>
>>
>> > I ran an lsof against
>> > an affected regionserver, and noticed that there were 10k+ unix sockets
>> > that are just called "socket", as well as another 10k+ of the form
>> > "/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-<int>_1_<int>".
>> The
>> > 2 seem related based on how closely the counts match.
>> >
>> > We are in the middle of a rolling upgrade from CDH5.3.8 to CDH5.7.0 (we
>> > handled the namenode upgrade separately).  The 5.3.8 nodes *do not*
>> > experience this issue. The 5.7.0 nodes *do. *We are holding off
>> upgrading
>> > more regionservers until we can figure this out. I'm not sure if any
>> > intermediate versions between the 2 have the issue.
>> >
>> > We traced the root cause to a hadoop job running against a basic table:
>> >
>> > 'my-table-1', {TABLE_ATTRIBUTES => {MAX_FILESIZE => '107374182400',
>> > MEMSTORE_FLUSHSIZE => '67108864'}, {NAME => '0', VERSIONS => '50',
>> > BLOOMFILTER => 'NONE', COMPRESSION => 'LZO', METADATA =>
>> > {'COMPRESSION_COMPACT' => 'LZO', 'ENCODE_ON_DISK' => 'true'}}
>> >
>> > This is very similar to all of our other tables (we have many).
>>
>>
>> You are doing MR against some of these also? They have different schemas?
>> No leaks here?
>>
>>
>>
>> > However,
>> > it's regions are getting up there in size, 40+gb per region, compressed.
>> > This has not been an issue for us previously.
>> >
>> > The hadoop job is a simple TableMapper job with no special parameters,
>> > though we haven't updated our client yet to the latest (will do that
>> once
>> > we finish the server side). The hadoop job runs on a separate hadoop
>> > cluster, remotely accessing the HBase cluster. It does not do any other
>> > reads or writes, outside of the TableMapper scans.
>> >
>> > Moving the regions off of an affected server, or killing the hadoop job,
>> > causes the file descriptors to gradually go back down to normal.
>> >
>> >
>> Any ideas?
>> >
>> >
>> Is it just the FD cache running 'normally'? 10k seems like a lot though.
>> 256 seems to be the default in hdfs but maybe it is different in CM or in
>> hbase?
>>
>> What is your dfs.client.read.shortcircuit.streams.cache.size set to?
>> St.Ack
>>
>>
>>
>> > Thanks,
>> >
>> > Bryan
>> >
>>
>

Reply via email to