I'll try to run with a higher caching to see how that changes things, thanks

On Mon, May 23, 2016 at 4:07 PM Stack <st...@duboce.net> wrote:

> How hard to change the below if only temporarily (Trying to get a datapoint
> or two to act on; the short circuit code hasn't changed that we know of...
> perhaps the scan chunking facility in 1.1 has some side effect we've not
> noticed up to this).
>
> If you up the caching to be bigger does it lower the rate of FD leak
> creation?
>
> If you cache the blocks, assuming it does not blow the cache for others,
> does that make a difference.
>
> Hang on... will be back in a sec... just sending this in meantime...
>
> St.Ack
>
> On Mon, May 23, 2016 at 12:20 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrote:
>
> > For reference, the Scan backing the job is pretty basic:
> >
> > Scan scan = new Scan();
> > scan.setCaching(500); // probably too small for the datasize we're
> dealing
> > with
> > scan.setCacheBlocks(false);
> > scan.setScanMetricsEnabled(true);
> > scan.setMaxVersions(1);
> > scan.setTimeRange(startTime, stopTime);
> >
> > Otherwise it is using the out-of-the-box TableInputFormat.
> >
> >
> >
> > On Mon, May 23, 2016 at 3:13 PM Bryan Beaudreault <
> > bbeaudrea...@hubspot.com>
> > wrote:
> >
> > > I've forced the issue to happen again. netstat takes a while to run on
> > > this host while it's happening, but I do not see an abnormal amount of
> > > CLOSE_WAIT (compared to other hosts).
> > >
> > > I forced more than usual number of regions for the affected table onto
> > the
> > > host to speed up the process.  File Descriptors are now growing quite
> > > rapidly, about 8-10 per second.
> > >
> > > This is what lsof looks like, multiplied by a couple thousand:
> > >
> > > COMMAND   PID  USER   FD      TYPE             DEVICE    SIZE/OFF
> > > NODE NAME
> > > java    23180 hbase  DEL    REG               0,16
>  3848784656
> > >
> >
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_377023711_1_1702253823
> > > java    23180 hbase  DEL    REG               0,16
>  3847643924
> > >
> >
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_377023711_1_1614925966
> > > java    23180 hbase  DEL    REG               0,16
>  3847614191
> > >
> >
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_377023711_1_888427288
> > >
> > > The only thing that varies is the last int on the end.
> > >
> > > > Anything about the job itself that is holding open references or
> > > throwing away files w/o closing them?
> > >
> > > The MR job does a TableMapper directly against HBase, which as far as I
> > > know uses the HBase RPC and does not hit HDFS directly at all. Is it
> > > possible that a long running scan (one with many, many next() calls)
> > could
> > > keep some references to HDFS open for the duration of the overall scan?
> > >
> > >
> > > On Mon, May 23, 2016 at 2:19 PM Bryan Beaudreault <
> > > bbeaudrea...@hubspot.com> wrote:
> > >
> > >> We run MR against many tables in all of our clusters, they mostly have
> > >> similar schema definitions though vary in terms of key length, #
> > columns,
> > >> etc. This is the only cluster and only table we've seen leak so far.
> > It's
> > >> probably the table with the biggest regions which we MR against,
> though
> > >> it's hard to verify that (anyone in engineering can run such a job).
> > >>
> > >> dfs.client.read.shortcircuit.streams.cache.size = 256
> > >>
> > >> Our typical FD amount is around 3000. When this hadoop job runs, that
> > >> can climb up to our limit of over 30k if we don't act -- it is a
> gradual
> > >> build up over the course of a couple hours. When we move the regions
> > off or
> > >> kill the job, the FDs will gradually go back down at roughly the same
> > pace.
> > >> It forms a graph in the shape of a pyramid.
> > >>
> > >> We don't use CM, we use mostly the default *-site.xml. We haven't
> > >> overridden anything related to this. The configs between CDH5.3.8 and
> > 5.7.0
> > >> are identical for us.
> > >>
> > >> On Mon, May 23, 2016 at 2:03 PM Stack <st...@duboce.net> wrote:
> > >>
> > >>> On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault <
> > >>> bbeaudrea...@hubspot.com
> > >>> > wrote:
> > >>>
> > >>> > Hey everyone,
> > >>> >
> > >>> > We are noticing a file descriptor leak that is only affecting nodes
> > in
> > >>> our
> > >>> > cluster running 5.7.0, not those still running 5.3.8.
> > >>>
> > >>>
> > >>> Translation: roughly hbase-1.2.0+hadoop-2.6.0 vs
> > >>> hbase-0.98.6+hadoop-2.5.0.
> > >>>
> > >>>
> > >>> > I ran an lsof against
> > >>> > an affected regionserver, and noticed that there were 10k+ unix
> > sockets
> > >>> > that are just called "socket", as well as another 10k+ of the form
> > >>> >
> > >>>
> > "/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-<int>_1_<int>".
> The
> > >>> > 2 seem related based on how closely the counts match.
> > >>> >
> > >>> > We are in the middle of a rolling upgrade from CDH5.3.8 to CDH5.7.0
> > (we
> > >>> > handled the namenode upgrade separately).  The 5.3.8 nodes *do not*
> > >>> > experience this issue. The 5.7.0 nodes *do. *We are holding off
> > >>> upgrading
> > >>> > more regionservers until we can figure this out. I'm not sure if
> any
> > >>> > intermediate versions between the 2 have the issue.
> > >>> >
> > >>> > We traced the root cause to a hadoop job running against a basic
> > table:
> > >>> >
> > >>> > 'my-table-1', {TABLE_ATTRIBUTES => {MAX_FILESIZE => '107374182400',
> > >>> > MEMSTORE_FLUSHSIZE => '67108864'}, {NAME => '0', VERSIONS => '50',
> > >>> > BLOOMFILTER => 'NONE', COMPRESSION => 'LZO', METADATA =>
> > >>> > {'COMPRESSION_COMPACT' => 'LZO', 'ENCODE_ON_DISK' => 'true'}}
> > >>> >
> > >>> > This is very similar to all of our other tables (we have many).
> > >>>
> > >>>
> > >>> You are doing MR against some of these also? They have different
> > schemas?
> > >>> No leaks here?
> > >>>
> > >>>
> > >>>
> > >>> > However,
> > >>> > it's regions are getting up there in size, 40+gb per region,
> > >>> compressed.
> > >>> > This has not been an issue for us previously.
> > >>> >
> > >>> > The hadoop job is a simple TableMapper job with no special
> > parameters,
> > >>> > though we haven't updated our client yet to the latest (will do
> that
> > >>> once
> > >>> > we finish the server side). The hadoop job runs on a separate
> hadoop
> > >>> > cluster, remotely accessing the HBase cluster. It does not do any
> > other
> > >>> > reads or writes, outside of the TableMapper scans.
> > >>> >
> > >>> > Moving the regions off of an affected server, or killing the hadoop
> > >>> job,
> > >>> > causes the file descriptors to gradually go back down to normal.
> > >>> >
> > >>> >
> > >>> Any ideas?
> > >>> >
> > >>> >
> > >>> Is it just the FD cache running 'normally'? 10k seems like a lot
> > though.
> > >>> 256 seems to be the default in hdfs but maybe it is different in CM
> or
> > in
> > >>> hbase?
> > >>>
> > >>> What is your dfs.client.read.shortcircuit.streams.cache.size set to?
> > >>> St.Ack
> > >>>
> > >>>
> > >>>
> > >>> > Thanks,
> > >>> >
> > >>> > Bryan
> > >>> >
> > >>>
> > >>
> >
>

Reply via email to