Re: maximize usage of cluster resources during ingestion

Jonathan Wonders Thu, 06 Jul 2017 18:03:05 -0700

I've personally never seen full CPU utilization during pure ingest.
Typically the bottleneck has been I/O related.  The majority of
steady-state CPU utilization under a heavy ingest load is probably due to
compression unless you have custom constraints running.  This can depend on
the compression algorithm you have selected.  There is probably a
measurable contribution from inserting into the in-memory map.  Otherwise,
not much computation occurs during ingest per mutation.


On Thu, Jul 6, 2017 at 8:18 AM, Dave Marion <[email protected]> wrote:

> That's a good point. I would also look at increasing
> tserver.total.mutation.queue.max. Are you seeing hold times? If not, I
> would keep pushing harder until you do, then move to multiple tablet
> servers. Do you have any GC logs?
>
>
> On July 6, 2017 at 4:47 AM Cyrille Savelief <[email protected]> wrote:
>
> Are you sure Accumulo is not waiting for your app's data? There might be
> GC pauses in your ingest code (we have already experienced that).
>
> Le jeu. 6 juil. 2017 à 10:32, Massimilian Mattetti <[email protected]>
> a écrit :
>
>> Thank you all for the suggestions.
>>
>> About the native memory map I checked the logs on each tablet server and
>> it was loaded correctly (of course the tserver.memory.maps.native.enabled
>> was set to true), so the GC pauses should not be the problem eventually. I
>> managed to get much better ingestion graph by reducing the native map size
>> to *2GB* and increasing the Batch Writer threads number from the default
>> (3 was really bad for my configuration) to *10* (I think it does not
>> make sense having more threads than tablet servers, am I right?).
>>
>> The configuration that I used for the table is:
>> "table.file.replication": "2",
>> "table.compaction.minor.logs.threshold": "3",
>> "table.durability": "flush",
>> "table.split.threshold": "1G"
>>
>> while for the tablet servers is:
>> "tserver.wal.blocksize": "1G",
>>  "tserver.walog.max.size": "2G",
>> "tserver.memory.maps.max": "2G",
>> "tserver.compaction.minor.concurrent.max": "50",
>> "tserver.compaction.major.concurrent.max": "20",
>> "tserver.wal.replication": "2",
>>  "tserver.compaction.major.thread.files.open.max": "15"
>>
>> The new graph:
>>
>>
>> I still have the problem of a CPU usage that is less than* 20%.* So I am
>> thinking to run multiple tablet servers per node (like 5 or 10) in order to
>> maximize the CPU usage. Besides that I do not have any other idea on how to
>> stress those servers with ingestion.
>> Any suggestions are very welcome. Meanwhile, thank you all again for your
>> help.
>>
>>
>> Best Regards,
>> Massimiliano
>>
>>
>>
>> From:        Jonathan Wonders <[email protected]>
>> To:        [email protected]
>> Date:        06/07/2017 04:01
>> Subject:        Re: maximize usage of cluster resources during ingestion
>> ------------------------------
>>
>>
>>
>> Hi Massimilian,
>>
>> Are you seeing held commits during the ingest pauses?  Just based on
>> having looked at many similar graphs in the past, this might be one of the
>> major culprits.  A tablet server has a memory region with a bounded size
>> (tserver.memory.maps.max) where it buffers data that has not yet been
>> written to RFiles (through the process of minor compaction).  The region is
>> segmented by tablet and each tablet can have a buffer that is undergoing
>> ingest as well as a buffer that is undergoing minor compaction.  A memory
>> manager decides when to initiate minor compactions for the tablet buffers
>> and the default implementation tries to keep the memory region 80-90% full
>> while preferring to compact the largest tablet buffers.  Creating larger
>> RFiles during minor compaction should lead to less major compactions.
>> During a minor compaction, the tablet buffer still "consumes" memory within
>> the in memory map and high ingest rates can lead to exhausing the remaining
>> capacity.  The default memory manage uses an adaptive strategy to predict
>> the expected memory usage and makes compaction decisions that should
>> maintain some free memory.  Batch writers can be bursty and a bit
>> unpredictable which could throw off these estimates.  Also, depending on
>> the ingest profile, sometimes an in-memory tablet buffer will consume a
>> large percentage of the total buffer.  This leads to long minor compactions
>> when the buffer size is large which can allow ingest enough time to exhaust
>> the buffer before that memory can be reclaimed.  When a tablet server has
>> to block ingest, it can affect client ingest rates to other tablet servers
>> due to the way that batch writers work.  This can lead to other tablet
>> servers underestimating future ingest rates which can further exacerbate
>> the problem.
>>
>> There are some configuration changes that could reduce the severity of
>> held commits, although they might reduce peak ingest rates.  Reducing the
>> in memory map size can reduce the maximum pause time due to held commits.
>> Adding additional tablets should help avoid the problem of a single tablet
>> buffer consuming a large percentage of the memory region.  It might be
>> better to aim for ~20 tablets per server if your problem allows for it.  It
>> is also possible to replace the memory manager with a custom one.  I've
>> tried this in the past and have seen stability improvements by making the
>> memory thresholds less aggressive (50-75% full).  This did reduce peak
>> ingest rate in some cases, but that was a reasonable tradeoff.
>>
>> Based on your current configuration, if a tablet server is serving 4
>> tablets and has a 32GB buffer, your first minor compactions will be at
>> least 8GB and they will probably grow larger over time until the tablets
>> naturally split.  Consider how long it would take to write this RFile
>> compared to your peak ingest rate.  As others have suggested, make sure to
>> use the native maps.  Based on your current JVM heap size, using the Java
>> in-memory map would probably lead to OOME or very bad GC performance.
>>
>> Accumulo can trace minor compaction durations so you can get a feel for
>> max pause times or measure the effect of configuration changes.
>>
>> Cheers,
>> --Jonathan
>>
>> On Wed, Jul 5, 2017 at 7:16 PM, Dave Marion <*[email protected]*
>> <[email protected]>> wrote:
>>
>>
>> Based on what Cyrille said, I would look at garbage collection,
>> specifically I would look at how much of your newly allocated objects spill
>> into the old generation before they are flushed to disk. Additionally, I
>> would turn off the debug log or log to SSD’s if you have them. Another
>> thought, seeing that you have 256GB RAM / node, is to run multiple tablet
>> servers per node. Do you have 10 threads on your Batch Writers? What about
>> the Batch Writer latency, is it too low such that you are not filling the
>> buffer?
>>
>>
>>
>> *From:* Massimilian Mattetti [mailto:*[email protected]*
>> <[email protected]>]
>> *Sent:* Wednesday, July 05, 2017 8:37 AM
>> *To:* *[email protected]* <[email protected]>
>> *Subject:* maximize usage of cluster resources during ingestion
>>
>>
>>
>> Hi all,
>>
>> I have an Accumulo 1.8.1 cluster made by 12 bare metal servers. Each
>> server has 256GB of Ram and 2 x 10 cores CPU. 2 machines are used as
>> masters (running HDFS NameNodes, Accumulo Master and Monitor). The other 10
>> machines has 12 Disks of 1 TB (11 used by HDFS DataNode process) and are
>> running Accumulo TServer processes. All the machines are connected via a
>> 10Gb network and 3 of them are running ZooKeeper. I have run some heavy
>> ingestion test on this cluster but I have never been able to reach more
>> than *20% *CPU usage on each Tablet Server. I am running an ingestion
>> process (using batch writers) on each data node. The table is pre-split in
>> order to have 4 tablets per tablet server. Monitoring the network I have
>> seen that data is received/sent from each node with a peak rate of about
>> 120MB/s / 100MB/s while the aggregated disk write throughput on each tablet
>> servers is around 120MB/s.
>>
>> The table configuration I am playing with are:
>> "table.file.replication": "2",
>> "table.compaction.minor.logs.threshold": "10",
>> "table.durability": "flush",
>> "table.file.max": "30",
>> "table.compaction.major.ratio": "9",
>> "table.split.threshold": "1G"
>>
>> while the tablet server configuration is:
>> "tserver.wal.blocksize": "2G",
>> "tserver.walog.max.size": "8G",
>> "tserver.memory.maps.max": "32G",
>> "tserver.compaction.minor.concurrent.max": "50",
>> "tserver.compaction.major.concurrent.max": "8",
>> "tserver.total.mutation.queue.max": "50M",
>> "tserver.wal.replication": "2",
>> "tserver.compaction.major.thread.files.open.max": "15"
>>
>> the tablet server heap has been set to 32GB
>>
>> From Monitor UI
>>
>>
>> As you can see I have a lot of valleys in which the ingestion rate
>> reaches 0.
>> What would be a good procedure to identify the bottleneck which causes
>> the 0 ingestion rate periods?
>> Thanks.
>>
>> Best Regards,
>> Max
>>
>>
>>
>
>

Re: maximize usage of cluster resources during ingestion

Reply via email to