Re: Optimizing bulk load performance

Harry Waye Fri, 25 Oct 2013 12:08:56 -0700

Haven't had a chance to run netperf, but spotted messages in syslog of the
form:


Oct 25 21:03:22 ... kernel: [107058.190743] net_ratelimit: 136 callbacks
suppressed
Oct 25 21:03:22 ... kernel: [107058.190746] nf_conntrack: table full,
dropping packet.

Which does perhaps suggest RPC requests are being dropped maybe.  ~16000
connections for port 50060 i.e. tasktracker, I guess I'll try raising the
max and seeing what effect that has.


On 24 October 2013 23:02, Harry Waye <[email protected]> wrote:

> Got it!  Re. 50% utilisation, I forgot to mention that 6 cores does not
> include hyper-threading.  Foolish I know, but that would explain CPU0 being
> at 50%.  The nodes are as stated in
> http://www.hetzner.de/en/hosting/produkte_rootserver/ex10 bar the RAID1.
>
>
> On 24 October 2013 22:50, Jean-Marc Spaggiari <[email protected]>wrote:
>
>> Remote calls to a server. Just forget about it ;) Please verify the
>> network
>> bandwidth between your nodes.
>>
>>
>> 2013/10/24 Harry Waye <[email protected]>
>>
>> > Excuse the ignorance, RCP?
>> >
>> >
>> > On 24 October 2013 22:28, Jean-Marc Spaggiari <[email protected]
>> > >wrote:
>> >
>> > > Your nodes are almost 50% idle... Might be something else. Sound it's
>> not
>> > > your disks nor your CPU... Maybe to many RCPs?
>> > >
>> > > Have you investigate on your network side? netperf might be a good
>> help
>> > for
>> > > you.
>> > >
>> > > JM
>> > >
>> > >
>> > > 2013/10/24 Harry Waye <[email protected]>
>> > >
>> > > > p.s. I guess this is more turning into a general hadoop issue, but
>> I'll
>> > > > keep the discussion here seeing that I have an audience, unless
>> there
>> > are
>> > > > objections.
>> > > >
>> > > >
>> > > > On 24 October 2013 22:02, Harry Waye <[email protected]> wrote:
>> > > >
>> > > > > So just a short update, I'll read into it a little more tomorrow.
>> >  This
>> > > > is
>> > > > > from three of the nodes:
>> > > > > https://gist.github.com/hazzadous/1264af7c674e1b3cf867
>> > > > >
>> > > > > The first is the grey guy.  Just glancing at it, it looks to
>> > fluctuate
>> > > > > more than the others.  I guess that could suggest that there are
>> some
>> > > > > issues with reading from the disks.  Interestingly, it's the only
>> one
>> > > > that
>> > > > > doesn't have smartd installed, which alerts us on changes for the
>> > other
>> > > > > nodes.  I suspect there's probably some mileage in checking its
>> smart
>> > > > > attributes.  Will do that tomorrow though.
>> > > > >
>> > > > > Out of curiosity, how do people normally monitor disk issues?  I'm
>> > > going
>> > > > > to set up collectd to push various things from smartctl tomorrow,
>> at
>> > > the
>> > > > > moment all we do is receive emails, which is mostly noise about
>> > problem
>> > > > > sector counts increasing +1.
>> > > > >
>> > > > >
>> > > > > On 24 October 2013 19:40, Jean-Marc Spaggiari <
>> > [email protected]
>> > > > >wrote:
>> > > > >
>> > > > >> Can you try vmstat 2? 2 is the interval in seconds it will
>> display
>> > the
>> > > > >> disk
>> > > > >> usage. On the extract here, nothing is running. only 8% is used.
>> (1%
>> > > > disk
>> > > > >> IO, 6% User, 1% sys)
>> > > > >>
>> > > > >> Run it on 2 or 3 different nodes while you are putting the load
>> on
>> > the
>> > > > >> cluster. And take a look at the 4 last numbers and see what the
>> > value
>> > > of
>> > > > >> the last one?
>> > > > >>
>> > > > >> On the usercpu0 graph, who is the gray guy showing hight?
>> > > > >>
>> > > > >> JM
>> > > > >>
>> > > > >> 2013/10/24 Harry Waye <[email protected]>
>> > > > >>
>> > > > >> > Ok I'm running a load job atm, I've add some possibly
>> > > incomprehensible
>> > > > >> > coloured lines to the graph: http://goo.gl/cUGCGG
>> > > > >> >
>> > > > >> > This is actually with one fewer nodes due to decommissioning to
>> > > > replace
>> > > > >> a
>> > > > >> > disk, hence I guess the reason for one squiggly line showing no
>> > disk
>> > > > >> > activity.  I've included only the cpu stats for CPU0 from each
>> > node.
>> > > > >>  The
>> > > > >> > last graph should read "Memory Used".  vmstat from one of the
>> > nodes:
>> > > > >> >
>> > > > >> > procs -----------memory---------- ---swap-- -----io----
>> -system--
>> > > > >> > ----cpu----
>> > > > >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in
>> cs
>> > us
>> > > > sy
>> > > > >> id
>> > > > >> > wa
>> > > > >> >  6  0      0 392448 524668 43823900    0    0   501  1044    0
>> >  0
>> > >  6
>> > > > >>  1
>> > > > >> > 91  1
>> > > > >> >
>> > > > >> > To me the wait doesn't seem that high.  Job stats are
>> > > > >> > http://goo.gl/ZYdUKp,  the job setup is
>> > > > >> > https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
>> > > > >> >
>> > > > >> > Does anything jump out at you?
>> > > > >> >
>> > > > >> > Cheers
>> > > > >> > H
>> > > > >> >
>> > > > >> >
>> > > > >> > On 24 October 2013 16:16, Harry Waye <[email protected]>
>> wrote:
>> > > > >> >
>> > > > >> > > Hi JM
>> > > > >> > >
>> > > > >> > > I took a snapshot on the initial run, before the changes:
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
>> > > > >> > >
>> > > > >> > > Good timing, disks appear to be exploding (ATA errors) atm
>> thus
>> > > I'm
>> > > > >> > > decommissioning and reprovisioning with new disks.  I'll be
>> > > > >> > reprovisioning
>> > > > >> > > as without RAID (it's software RAID just to compound the
>> issue)
>> > > > >> although
>> > > > >> > > not sure how I'll go about migrating all nodes.  I guess I'd
>> > need
>> > > to
>> > > > >> put
>> > > > >> > > more correctly speced nodes in the rack and decommission the
>> > > > existing.
>> > > > >> > >  Makes diff. to
>> > > > >> > >
>> > > > >> > > We're using hetzner at the moment which may not have been a
>> good
>> > > > >> choice.
>> > > > >> > >  Has anyone had any experience with them wrt. Hadoop?  They
>> > offer
>> > > 7
>> > > > >> and
>> > > > >> > 15
>> > > > >> > > disk options, but are low on the cpu front (quad core).  Our
>> > > > workload
>> > > > >> > will
>> > > > >> > > be I assume on the high side.  There's also a 8 disk Dell
>> > > PowerEdge
>> > > > >> what
>> > > > >> > is
>> > > > >> > > a little more powerful.  What hosting providers would people
>> > > > >> recommended?
>> > > > >> > >  (And what would be the strategy for migrating?)
>> > > > >> > >
>> > > > >> > > Anyhow, when I have things more stable I'll have a look at
>> > > checking
>> > > > >> out
>> > > > >> > > what's using the cpu.  In the mean time, can anything be
>> gleamed
>> > > > from
>> > > > >> the
>> > > > >> > > above snap?
>> > > > >> > >
>> > > > >> > > Cheers
>> > > > >> > > H
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > On 24 October 2013 15:14, Jean-Marc Spaggiari <
>> > > > >> [email protected]
>> > > > >> > >wrote:
>> > > > >> > >
>> > > > >> > >> Hi Harry,
>> > > > >> > >>
>> > > > >> > >> Do you have more details on the exact load? Can you run
>> vmstats
>> > > and
>> > > > >> see
>> > > > >> > >> what kind of load it is? Is it user? cpu? wio?
>> > > > >> > >>
>> > > > >> > >> I suspect your disks to be the issue. There is 2 things
>> here.
>> > > > >> > >>
>> > > > >> > >> First, we don't recommend RAID for the HDFS/HBase disk. The
>> > best
>> > > is
>> > > > >> to
>> > > > >> > >> simply mount the disks on 2 mounting points and give them to
>> > > HDFS.
>> > > > >> > >> Second, 2 disks per not is very low. On a dev cluster is not
>> > even
>> > > > >> > >> recommended. In production, you should go with 12 or more.
>> > > > >> > >>
>> > > > >> > >> So with only 2 disks in RAID, I suspect your WIO to be high
>> > which
>> > > > is
>> > > > >> > what
>> > > > >> > >> might slow your process.
>> > > > >> > >>
>> > > > >> > >> Can you take a look on that direction? If it's not that, we
>> > will
>> > > > >> > continue
>> > > > >> > >> to investigate ;)
>> > > > >> > >>
>> > > > >> > >> Thanks,
>> > > > >> > >>
>> > > > >> > >> JM
>> > > > >> > >>
>> > > > >> > >>
>> > > > >> > >> 2013/10/23 Harry Waye <[email protected]>
>> > > > >> > >>
>> > > > >> > >> > I'm trying to load data into hbase using HFileOutputFormat
>> > and
>> > > > >> > >> incremental
>> > > > >> > >> > bulk load but am getting rather lackluster performance,
>> 10h
>> > for
>> > > > >> ~0.5TB
>> > > > >> > >> > data, ~50000 blocks.  This is being loaded into a table
>> that
>> > > has
>> > > > 2
>> > > > >> > >> > families, 9 columns, 2500 regions and is ~10TB in size.
>>  Keys
>> > > are
>> > > > >> md5
>> > > > >> > >> > hashes and regions are pretty evenly spread.  The
>> majority of
>> > > > time
>> > > > >> > >> appears
>> > > > >> > >> > to be spend in the reduce phase, with the map phase
>> > completing
>> > > > very
>> > > > >> > >> > quickly.  The network doesn't appear to be saturated, but
>> the
>> > > > load
>> > > > >> is
>> > > > >> > >> > consistently at 6 which is the number or reduce tasks per
>> > node.
>> > > > >> > >> >
>> > > > >> > >> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else
>> on
>> > the
>> > > > >> rack).
>> > > > >> > >> >
>> > > > >> > >> > MR conf: 6 mappers, 6 reducers per node.
>> > > > >> > >> >
>> > > > >> > >> > I spoke to someone on IRC and they recommended reducing
>> job
>> > > > output
>> > > > >> > >> > replication to 1, and reducing the number of mappers
>> which I
>> > > > >> reduced
>> > > > >> > to
>> > > > >> > >> 2.
>> > > > >> > >> >  Reducing replication appeared not to make any difference,
>> > > > reducing
>> > > > >> > >> > reducers appeared just to slow the job down.  I'm going to
>> > > have a
>> > > > >> look
>> > > > >> > >> at
>> > > > >> > >> > running the benchmarks mentioned on Michael Noll's blog
>> and
>> > see
>> > > > >> what
>> > > > >> > >> that
>> > > > >> > >> > turns up.  I guess some questions I have are:
>> > > > >> > >> >
>> > > > >> > >> > How does the global number/size of blocks affect perf.?
>>  (I
>> > > have
>> > > > a
>> > > > >> lot
>> > > > >> > >> of
>> > > > >> > >> > 10mb files, which are the input files)
>> > > > >> > >> >
>> > > > >> > >> > How does the job local number/size of input blocks affect
>> > > perf.?
>> > > > >> > >> >
>> > > > >> > >> > What is actually happening in the reduce phase that
>> requires
>> > so
>> > > > >> much
>> > > > >> > >> CPU?
>> > > > >> > >> >  I assume the actual construction of HFiles isn't
>> intensive.
>> > > > >> > >> >
>> > > > >> > >> > Ultimately, how can I improve performance?
>> > > > >> > >> > Thanks
>> > > > >> > >> >
>> > > > >> > >>
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > --
>> > > > >> > > Harry Waye, Co-founder/CTO
>> > > > >> > > [email protected]
>> > > > >> > > +44 7890 734289
>> > > > >> > >
>> > > > >> > > Follow us on Twitter: @arachnys <
>> > https://twitter.com/#!/arachnys>
>> > > > >> > >
>> > > > >> > > ---
>> > > > >> > > Arachnys Information Services Limited is a company
>> registered in
>> > > > >> England
>> > > > >> > &
>> > > > >> > > Wales. Company number: 7269723. Registered office: 40
>> Clarendon
>> > > St,
>> > > > >> > > Cambridge, CB1 1JX.
>> > > > >> > >
>> > > > >> >
>> > > > >> >
>> > > > >> >
>> > > > >> > --
>> > > > >> > Harry Waye, Co-founder/CTO
>> > > > >> > [email protected]
>> > > > >> > +44 7890 734289
>> > > > >> >
>> > > > >> > Follow us on Twitter: @arachnys <
>> https://twitter.com/#!/arachnys>
>> > > > >> >
>> > > > >> > ---
>> > > > >> > Arachnys Information Services Limited is a company registered
>> in
>> > > > >> England &
>> > > > >> > Wales. Company number: 7269723. Registered office: 40 Clarendon
>> > St,
>> > > > >> > Cambridge, CB1 1JX.
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Harry Waye, Co-founder/CTO
>> > > > > [email protected]
>> > > > > +44 7890 734289
>> > > > >
>> > > > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>> > > > >
>> > > > > ---
>> > > > > Arachnys Information Services Limited is a company registered in
>> > > England
>> > > > &
>> > > > > Wales. Company number: 7269723. Registered office: 40 Clarendon
>> St,
>> > > > > Cambridge, CB1 1JX.
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Harry Waye, Co-founder/CTO
>> > > > [email protected]
>> > > > +44 7890 734289
>> > > >
>> > > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>> > > >
>> > > > ---
>> > > > Arachnys Information Services Limited is a company registered in
>> > England
>> > > &
>> > > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
>> > > > Cambridge, CB1 1JX.
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Harry Waye, Co-founder/CTO
>> > [email protected]
>> > +44 7890 734289
>> >
>> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>> >
>> > ---
>> > Arachnys Information Services Limited is a company registered in
>> England &
>> > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
>> > Cambridge, CB1 1JX.
>> >
>>
>
>
>
> --
> Harry Waye, Co-founder/CTO
> [email protected]
> +44 7890 734289
>
> Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>
> ---
> Arachnys Information Services Limited is a company registered in England &
> Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> Cambridge, CB1 1JX.
>



-- 
Harry Waye, Co-founder/CTO
[email protected]
+44 7890 734289

Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>

---
Arachnys Information Services Limited is a company registered in England &
Wales. Company number: 7269723. Registered office: 40 Clarendon St,
Cambridge, CB1 1JX.

Re: Optimizing bulk load performance

Reply via email to