I've personally never seen full CPU utilization during pure ingest. Typically the bottleneck has been I/O related. The majority of steady-state CPU utilization under a heavy ingest load is probably due to compression unless you have custom constraints running. This can depend on the compression algorithm you have selected. There is probably a measurable contribution from inserting into the in-memory map. Otherwise, not much computation occurs during ingest per mutation.
On Thu, Jul 6, 2017 at 8:18 AM, Dave Marion <[email protected]> wrote: > That's a good point. I would also look at increasing > tserver.total.mutation.queue.max. Are you seeing hold times? If not, I > would keep pushing harder until you do, then move to multiple tablet > servers. Do you have any GC logs? > > > On July 6, 2017 at 4:47 AM Cyrille Savelief <[email protected]> wrote: > > Are you sure Accumulo is not waiting for your app's data? There might be > GC pauses in your ingest code (we have already experienced that). > > Le jeu. 6 juil. 2017 à 10:32, Massimilian Mattetti <[email protected]> > a écrit : > >> Thank you all for the suggestions. >> >> About the native memory map I checked the logs on each tablet server and >> it was loaded correctly (of course the tserver.memory.maps.native.enabled >> was set to true), so the GC pauses should not be the problem eventually. I >> managed to get much better ingestion graph by reducing the native map size >> to *2GB* and increasing the Batch Writer threads number from the default >> (3 was really bad for my configuration) to *10* (I think it does not >> make sense having more threads than tablet servers, am I right?). >> >> The configuration that I used for the table is: >> "table.file.replication": "2", >> "table.compaction.minor.logs.threshold": "3", >> "table.durability": "flush", >> "table.split.threshold": "1G" >> >> while for the tablet servers is: >> "tserver.wal.blocksize": "1G", >> "tserver.walog.max.size": "2G", >> "tserver.memory.maps.max": "2G", >> "tserver.compaction.minor.concurrent.max": "50", >> "tserver.compaction.major.concurrent.max": "20", >> "tserver.wal.replication": "2", >> "tserver.compaction.major.thread.files.open.max": "15" >> >> The new graph: >> >> >> I still have the problem of a CPU usage that is less than* 20%.* So I am >> thinking to run multiple tablet servers per node (like 5 or 10) in order to >> maximize the CPU usage. Besides that I do not have any other idea on how to >> stress those servers with ingestion. >> Any suggestions are very welcome. Meanwhile, thank you all again for your >> help. >> >> >> Best Regards, >> Massimiliano >> >> >> >> From: Jonathan Wonders <[email protected]> >> To: [email protected] >> Date: 06/07/2017 04:01 >> Subject: Re: maximize usage of cluster resources during ingestion >> ------------------------------ >> >> >> >> Hi Massimilian, >> >> Are you seeing held commits during the ingest pauses? Just based on >> having looked at many similar graphs in the past, this might be one of the >> major culprits. A tablet server has a memory region with a bounded size >> (tserver.memory.maps.max) where it buffers data that has not yet been >> written to RFiles (through the process of minor compaction). The region is >> segmented by tablet and each tablet can have a buffer that is undergoing >> ingest as well as a buffer that is undergoing minor compaction. A memory >> manager decides when to initiate minor compactions for the tablet buffers >> and the default implementation tries to keep the memory region 80-90% full >> while preferring to compact the largest tablet buffers. Creating larger >> RFiles during minor compaction should lead to less major compactions. >> During a minor compaction, the tablet buffer still "consumes" memory within >> the in memory map and high ingest rates can lead to exhausing the remaining >> capacity. The default memory manage uses an adaptive strategy to predict >> the expected memory usage and makes compaction decisions that should >> maintain some free memory. Batch writers can be bursty and a bit >> unpredictable which could throw off these estimates. Also, depending on >> the ingest profile, sometimes an in-memory tablet buffer will consume a >> large percentage of the total buffer. This leads to long minor compactions >> when the buffer size is large which can allow ingest enough time to exhaust >> the buffer before that memory can be reclaimed. When a tablet server has >> to block ingest, it can affect client ingest rates to other tablet servers >> due to the way that batch writers work. This can lead to other tablet >> servers underestimating future ingest rates which can further exacerbate >> the problem. >> >> There are some configuration changes that could reduce the severity of >> held commits, although they might reduce peak ingest rates. Reducing the >> in memory map size can reduce the maximum pause time due to held commits. >> Adding additional tablets should help avoid the problem of a single tablet >> buffer consuming a large percentage of the memory region. It might be >> better to aim for ~20 tablets per server if your problem allows for it. It >> is also possible to replace the memory manager with a custom one. I've >> tried this in the past and have seen stability improvements by making the >> memory thresholds less aggressive (50-75% full). This did reduce peak >> ingest rate in some cases, but that was a reasonable tradeoff. >> >> Based on your current configuration, if a tablet server is serving 4 >> tablets and has a 32GB buffer, your first minor compactions will be at >> least 8GB and they will probably grow larger over time until the tablets >> naturally split. Consider how long it would take to write this RFile >> compared to your peak ingest rate. As others have suggested, make sure to >> use the native maps. Based on your current JVM heap size, using the Java >> in-memory map would probably lead to OOME or very bad GC performance. >> >> Accumulo can trace minor compaction durations so you can get a feel for >> max pause times or measure the effect of configuration changes. >> >> Cheers, >> --Jonathan >> >> On Wed, Jul 5, 2017 at 7:16 PM, Dave Marion <*[email protected]* >> <[email protected]>> wrote: >> >> >> Based on what Cyrille said, I would look at garbage collection, >> specifically I would look at how much of your newly allocated objects spill >> into the old generation before they are flushed to disk. Additionally, I >> would turn off the debug log or log to SSD’s if you have them. Another >> thought, seeing that you have 256GB RAM / node, is to run multiple tablet >> servers per node. Do you have 10 threads on your Batch Writers? What about >> the Batch Writer latency, is it too low such that you are not filling the >> buffer? >> >> >> >> *From:* Massimilian Mattetti [mailto:*[email protected]* >> <[email protected]>] >> *Sent:* Wednesday, July 05, 2017 8:37 AM >> *To:* *[email protected]* <[email protected]> >> *Subject:* maximize usage of cluster resources during ingestion >> >> >> >> Hi all, >> >> I have an Accumulo 1.8.1 cluster made by 12 bare metal servers. Each >> server has 256GB of Ram and 2 x 10 cores CPU. 2 machines are used as >> masters (running HDFS NameNodes, Accumulo Master and Monitor). The other 10 >> machines has 12 Disks of 1 TB (11 used by HDFS DataNode process) and are >> running Accumulo TServer processes. All the machines are connected via a >> 10Gb network and 3 of them are running ZooKeeper. I have run some heavy >> ingestion test on this cluster but I have never been able to reach more >> than *20% *CPU usage on each Tablet Server. I am running an ingestion >> process (using batch writers) on each data node. The table is pre-split in >> order to have 4 tablets per tablet server. Monitoring the network I have >> seen that data is received/sent from each node with a peak rate of about >> 120MB/s / 100MB/s while the aggregated disk write throughput on each tablet >> servers is around 120MB/s. >> >> The table configuration I am playing with are: >> "table.file.replication": "2", >> "table.compaction.minor.logs.threshold": "10", >> "table.durability": "flush", >> "table.file.max": "30", >> "table.compaction.major.ratio": "9", >> "table.split.threshold": "1G" >> >> while the tablet server configuration is: >> "tserver.wal.blocksize": "2G", >> "tserver.walog.max.size": "8G", >> "tserver.memory.maps.max": "32G", >> "tserver.compaction.minor.concurrent.max": "50", >> "tserver.compaction.major.concurrent.max": "8", >> "tserver.total.mutation.queue.max": "50M", >> "tserver.wal.replication": "2", >> "tserver.compaction.major.thread.files.open.max": "15" >> >> the tablet server heap has been set to 32GB >> >> From Monitor UI >> >> >> As you can see I have a lot of valleys in which the ingestion rate >> reaches 0. >> What would be a good procedure to identify the bottleneck which causes >> the 0 ingestion rate periods? >> Thanks. >> >> Best Regards, >> Max >> >> >> > >
