Re: maximize usage of cluster resources during ingestion

Josh Elser Wed, 12 Jul 2017 08:17:16 -0700

You probably want to split the table further than just 4 tablets pertablet server. Try 10's of tablets per server.

Also, merging the content from (who I assume is) your coworker on thisstackoverflow post[1], I don't believe the suggestion[2] to verify WALmax size, minc threshold, and native maps size was brought up yet.

Also, did you look at the JVM GC logs for the TabletServers like waspreviously suggested to you?

[1]https://stackoverflow.com/questions/44928354/accumulo-tablet-server-doesnt-utilize-all-available-resources-on-host-machine/[2]https://accumulo.apache.org/1.8/accumulo_user_manual.html#_native_maps_configuration


On 7/12/17 10:12 AM, Massimilian Mattetti wrote:

Hi all,
I ran a few experiments in the last days trying to identify what is thebottleneck for the ingestion process.- Running 10 tservers per node instead of only one gave me a veryneglectable performance improvement of about 15%.- Running the ingestor processes from the two masters give the sameperformance as running one ingestor process in each tablet server (10ingestors)- neither the network limit (10 Gb network) nor the disk throughputlimit has been reached (1GB/s per node reached while running theTestDFSIO benchmark on HDFS)
- CPU is always around 20% on each tserver
- changing compression from GZ to snappy did not provide any benefit
- increasing the tserver.total.mutation.queue.maxto 200MB actuallydecreased the performanceI am going to run some ingestion experiment with Kudu over the next fewdays, but any other suggestion on how improve the performance onAccumulo is very welcome.
Thanks.

Best Regards,
Massimiliano



From: Jonathan Wonders <[email protected]>
To: [email protected], Dave Marion <[email protected]>
Date: 07/07/2017 04:02
Subject: Re: maximize usage of cluster resources during ingestion
------------------------------------------------------------------------
I've personally never seen full CPU utilization during pure ingest.Typically the bottleneck has been I/O related. The majority ofsteady-state CPU utilization under a heavy ingest load is probably dueto compression unless you have custom constraints running. This candepend on the compression algorithm you have selected. There isprobably a measurable contribution from inserting into the in-memorymap. Otherwise, not much computation occurs during ingest per mutation.
On Thu, Jul 6, 2017 at 8:18 AM, Dave Marion <[email protected]_<mailto:[email protected]>> wrote:That's a good point. I would also look at increasingtserver.total.mutation.queue.max. Are you seeing hold times? If not, Iwould keep pushing harder until you do, then move to multiple tabletservers. Do you have any GC logs?
On July 6, 2017 at 4:47 AM Cyrille Savelief <[email protected]_<mailto:[email protected]>> wrote:
Are you sure Accumulo is not waiting for your app's data? There might beGC pauses in your ingest code (we have already experienced that).
Le jeu. 6 juil. 2017 à 10:32, Massimilian Mattetti<[email protected]_ <mailto:[email protected]>> a écrit :
Thank you all for the suggestions.
About the native memory map I checked the logs on each tablet server andit was loaded correctly (of course thetserver.memory.maps.native.enabled was set to true), so the GC pausesshould not be the problem eventually. I managed to get much betteringestion graph by reducing the native map size to *2GB* and increasingthe Batch Writer threads number from the default (3 was really bad formy configuration) to *10* (I think it does not make sense having morethreads than tablet servers, am I right?).
The configuration that I used for the table is:
"table.file.replication": "2",
"table.compaction.minor.logs.threshold": "3",
"table.durability": "flush",
"table.split.threshold": "1G"

while for the tablet servers is:
"tserver.wal.blocksize": "1G",
  "tserver.walog.max.size": "2G",
"tserver.memory.maps.max": "2G",
"tserver.compaction.minor.concurrent.max": "50",
"tserver.compaction.major.concurrent.max": "20",
"tserver.wal.replication": "2",
  "tserver.compaction.major.thread.files.open.max": "15"

The new graph:
I still have the problem of a CPU usage that is less than*20%.* So I amthinking to run multiple tablet servers per node (like 5 or 10) in orderto maximize the CPU usage. Besides that I do not have any other idea onhow to stress those servers with ingestion.Any suggestions are very welcome. Meanwhile, thank you all again foryour help.
Best Regards,
Massimiliano
From: Jonathan Wonders <[email protected]_<mailto:[email protected]>>
To: [email protected]_ <mailto:[email protected]>
Date: 06/07/2017 04:01
Subject: Re: maximize usage of cluster resources during ingestion
------------------------------------------------------------------------



Hi Massimilian,
Are you seeing held commits during the ingest pauses? Just based onhaving looked at many similar graphs in the past, this might be one ofthe major culprits. A tablet server has a memory region with a boundedsize (tserver.memory.maps.max) where it buffers data that has not yetbeen written to RFiles (through the process of minor compaction). Theregion is segmented by tablet and each tablet can have a buffer that isundergoing ingest as well as a buffer that is undergoing minorcompaction. A memory manager decides when to initiate minor compactionsfor the tablet buffers and the default implementation tries to keep thememory region 80-90% full while preferring to compact the largest tabletbuffers. Creating larger RFiles during minor compaction should lead toless major compactions. During a minor compaction, the tablet bufferstill "consumes" memory within the in memory map and high ingest ratescan lead to exhausing the remaining capacity. The default memory manageuses an adaptive strategy to predict the expected memory usage and makescompaction decisions that should maintain some free memory. Batchwriters can be bursty and a bit unpredictable which could throw offthese estimates. Also, depending on the ingest profile, sometimes anin-memory tablet buffer will consume a large percentage of the totalbuffer. This leads to long minor compactions when the buffer size islarge which can allow ingest enough time to exhaust the buffer beforethat memory can be reclaimed. When a tablet server has to block ingest,it can affect client ingest rates to other tablet servers due to the waythat batch writers work. This can lead to other tablet serversunderestimating future ingest rates which can further exacerbate theproblem.
There are some configuration changes that could reduce the severity ofheld commits, although they might reduce peak ingest rates. Reducingthe in memory map size can reduce the maximum pause time due to heldcommits. Adding additional tablets should help avoid the problem of asingle tablet buffer consuming a large percentage of the memory region.It might be better to aim for ~20 tablets per server if your problemallows for it. It is also possible to replace the memory manager with acustom one. I've tried this in the past and have seen stabilityimprovements by making the memory thresholds less aggressive (50-75%full). This did reduce peak ingest rate in some cases, but that was areasonable tradeoff.
Based on your current configuration, if a tablet server is serving 4tablets and has a 32GB buffer, your first minor compactions will be atleast 8GB and they will probably grow larger over time until the tabletsnaturally split. Consider how long it would take to write this RFilecompared to your peak ingest rate. As others have suggested, make sureto use the native maps. Based on your current JVM heap size, using theJava in-memory map would probably lead to OOME or very bad GC performance.
Accumulo can trace minor compaction durations so you can get a feel formax pause times or measure the effect of configuration changes.
Cheers,
--Jonathan
On Wed, Jul 5, 2017 at 7:16 PM, Dave Marion <[email protected]_<mailto:[email protected]>> wrote:
Based on what Cyrille said, I would look at garbage collection,specifically I would look at how much of your newly allocated objectsspill into the old generation before they are flushed to disk.Additionally, I would turn off the debug log or log to SSD’s if you havethem. Another thought, seeing that you have 256GB RAM / node, is to runmultiple tablet servers per node. Do you have 10 threads on your BatchWriters? What about the Batch Writer latency, is it too low such thatyou are not filling the buffer?
*From:* Massimilian Mattetti [mailto:[email protected]_<mailto:[email protected]>] *
Sent:* Wednesday, July 05, 2017 8:37 AM*
To:* [email protected]_ <mailto:[email protected]>*
Subject:* maximize usage of cluster resources during ingestion

Hi all,
I have an Accumulo 1.8.1 cluster made by 12 bare metal servers. Eachserver has 256GB of Ram and 2 x 10 cores CPU. 2 machines are used asmasters (running HDFS NameNodes, Accumulo Master and Monitor). The other10 machines has 12 Disks of 1 TB (11 used by HDFS DataNode process) andare running Accumulo TServer processes. All the machines are connectedvia a 10Gb network and 3 of them are running ZooKeeper. I have run someheavy ingestion test on this cluster but I have never been able to reachmore than *20% *CPU usage on each Tablet Server. I am running aningestion process (using batch writers) on each data node. The table ispre-split in order to have 4 tablets per tablet server. Monitoring thenetwork I have seen that data is received/sent from each node with apeak rate of about 120MB/s / 100MB/s while the aggregated disk writethroughput on each tablet servers is around 120MB/s.
The table configuration I am playing with are:
"table.file.replication": "2",
"table.compaction.minor.logs.threshold": "10",
"table.durability": "flush",
"table.file.max": "30",
"table.compaction.major.ratio": "9",
"table.split.threshold": "1G"

while the tablet server configuration is:
"tserver.wal.blocksize": "2G",
"tserver.walog.max.size": "8G",
"tserver.memory.maps.max": "32G",
"tserver.compaction.minor.concurrent.max": "50",
"tserver.compaction.major.concurrent.max": "8",
"tserver.total.mutation.queue.max": "50M",
"tserver.wal.replication": "2",
"tserver.compaction.major.thread.files.open.max": "15"

the tablet server heap has been set to 32GB

 From Monitor UI
As you can see I have a lot of valleys in which the ingestion ratereaches 0.What would be a good procedure to identify the bottleneck which causesthe 0 ingestion rate periods?
Thanks.

Best Regards,
Max

Re: maximize usage of cluster resources during ingestion

Reply via email to