We run MR against many tables in all of our clusters, they mostly have
similar schema definitions though vary in terms of key length, # columns,
etc. This is the only cluster and only table we've seen leak so far. It's
probably the table with the biggest regions which we MR against, though
it's hard to verify that (anyone in engineering can run such a job).

dfs.client.read.shortcircuit.streams.cache.size = 256

Our typical FD amount is around 3000. When this hadoop job runs, that can
climb up to our limit of over 30k if we don't act -- it is a gradual build
up over the course of a couple hours. When we move the regions off or kill
the job, the FDs will gradually go back down at roughly the same pace. It
forms a graph in the shape of a pyramid.

We don't use CM, we use mostly the default *-site.xml. We haven't
overridden anything related to this. The configs between CDH5.3.8 and 5.7.0
are identical for us.

On Mon, May 23, 2016 at 2:03 PM Stack <st...@duboce.net> wrote:

> On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > Hey everyone,
> >
> > We are noticing a file descriptor leak that is only affecting nodes in
> our
> > cluster running 5.7.0, not those still running 5.3.8.
>
>
> Translation: roughly hbase-1.2.0+hadoop-2.6.0 vs hbase-0.98.6+hadoop-2.5.0.
>
>
> > I ran an lsof against
> > an affected regionserver, and noticed that there were 10k+ unix sockets
> > that are just called "socket", as well as another 10k+ of the form
> > "/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-<int>_1_<int>".
> The
> > 2 seem related based on how closely the counts match.
> >
> > We are in the middle of a rolling upgrade from CDH5.3.8 to CDH5.7.0 (we
> > handled the namenode upgrade separately).  The 5.3.8 nodes *do not*
> > experience this issue. The 5.7.0 nodes *do. *We are holding off upgrading
> > more regionservers until we can figure this out. I'm not sure if any
> > intermediate versions between the 2 have the issue.
> >
> > We traced the root cause to a hadoop job running against a basic table:
> >
> > 'my-table-1', {TABLE_ATTRIBUTES => {MAX_FILESIZE => '107374182400',
> > MEMSTORE_FLUSHSIZE => '67108864'}, {NAME => '0', VERSIONS => '50',
> > BLOOMFILTER => 'NONE', COMPRESSION => 'LZO', METADATA =>
> > {'COMPRESSION_COMPACT' => 'LZO', 'ENCODE_ON_DISK' => 'true'}}
> >
> > This is very similar to all of our other tables (we have many).
>
>
> You are doing MR against some of these also? They have different schemas?
> No leaks here?
>
>
>
> > However,
> > it's regions are getting up there in size, 40+gb per region, compressed.
> > This has not been an issue for us previously.
> >
> > The hadoop job is a simple TableMapper job with no special parameters,
> > though we haven't updated our client yet to the latest (will do that once
> > we finish the server side). The hadoop job runs on a separate hadoop
> > cluster, remotely accessing the HBase cluster. It does not do any other
> > reads or writes, outside of the TableMapper scans.
> >
> > Moving the regions off of an affected server, or killing the hadoop job,
> > causes the file descriptors to gradually go back down to normal.
> >
> >
> Any ideas?
> >
> >
> Is it just the FD cache running 'normally'? 10k seems like a lot though.
> 256 seems to be the default in hdfs but maybe it is different in CM or in
> hbase?
>
> What is your dfs.client.read.shortcircuit.streams.cache.size set to?
> St.Ack
>
>
>
> > Thanks,
> >
> > Bryan
> >
>

Reply via email to