This was a great question. I want start recording answers to these types of
questions in the troubleshooting documentation[1] for 2.0. I made a pull
request[2] to the website repo for this one if anyone wants to
review/comment on it.

[1]: https://accumulo.apache.org/docs/unreleased/troubleshooting/basic
[2]: https://github.com/apache/accumulo-website/pull/18


On Wed, Jul 5, 2017 at 3:32 PM Christopher <[email protected]> wrote:

> Huge GC pauses can be mitigated by ensuring you're using the Accumulo
> native maps library.
>
> On Wed, Jul 5, 2017 at 11:05 AM Cyrille Savelief <[email protected]>
> wrote:
>
>> Hi Massimilian*,*
>>
>> Using a MultiTableBatchWriter we are able to ingest about 600K entries/s
>> on a single node (30Gb of memory, 8 vCPU) running Hadoop, Zookeeper,
>> Accumulo and our ingest process. For us, "valleys" came from huge GC pauses.
>>
>> Best,
>>
>> Cyrille
>>
>> Le mer. 5 juil. 2017 à 14:37, Massimilian Mattetti <[email protected]>
>> a écrit :
>>
>>> Hi all,
>>>
>>> I have an Accumulo 1.8.1 cluster made by 12 bare metal servers. Each
>>> server has 256GB of Ram and 2 x 10 cores CPU. 2 machines are used as
>>> masters (running HDFS NameNodes, Accumulo Master and Monitor). The other 10
>>> machines has 12 Disks of 1 TB (11 used by HDFS DataNode process) and are
>>> running Accumulo TServer processes. All the machines are connected via a
>>> 10Gb network and 3 of them are running ZooKeeper. I have run some heavy
>>> ingestion test on this cluster but I have never been able to reach more
>>> than *20% *CPU usage on each Tablet Server. I am running an ingestion
>>> process (using batch writers) on each data node. The table is pre-split in
>>> order to have 4 tablets per tablet server. Monitoring the network I have
>>> seen that data is received/sent from each node with a peak rate of about
>>> 120MB/s / 100MB/s while the aggregated disk write throughput on each tablet
>>> servers is around 120MB/s.
>>>
>>> The table configuration I am playing with are:
>>> "table.file.replication": "2",
>>> "table.compaction.minor.logs.threshold": "10",
>>> "table.durability": "flush",
>>> "table.file.max": "30",
>>> "table.compaction.major.ratio": "9",
>>> "table.split.threshold": "1G"
>>>
>>> while the tablet server configuration is:
>>> "tserver.wal.blocksize": "2G",
>>> "tserver.walog.max.size": "8G",
>>> "tserver.memory.maps.max": "32G",
>>> "tserver.compaction.minor.concurrent.max": "50",
>>> "tserver.compaction.major.concurrent.max": "8",
>>> "tserver.total.mutation.queue.max": "50M",
>>> "tserver.wal.replication": "2",
>>> "tserver.compaction.major.thread.files.open.max": "15"
>>>
>>> the tablet server heap has been set to 32GB
>>>
>>> From Monitor UI
>>>
>>>
>>> As you can see I have a lot of valleys in which the ingestion rate
>>> reaches 0.
>>> What would be a good procedure to identify the bottleneck which causes
>>> the 0 ingestion rate periods?
>>> Thanks.
>>>
>>> Best Regards,
>>> Max
>>>
>>>

Reply via email to