On Thu, Apr 4, 2013 at 2:25 PM, Eric Newton <[email protected]> wrote:
> Have you pre-split your tablet to spread the load out to all the machines? > Yes. We are using splits from loading the whole dataset previously. > Does the data distribution match your splits? > Yes. See above. > Is the ingest data already sorted (that is, it always writes to the last > tablet)? > No. The data writes to multiple tablets concurrently. We set up a queue > parameter and divide the data into multiple queues. > How much memory and how many threads are you using in your batchwriters? > I believe we have 16GB of memory for the Java writer with 18 threads > running per server. > > Check the ingest rates on tablet server monitor page and look for hot > spots. > There are certain servers that have higher ingest rates, and the server > that is busiest changes over time, but the overall ingestion rate will not > go up. > > > > > On Thu, Apr 4, 2013 at 2:01 PM, Jimmy Lin <[email protected]> wrote: > >> Hello, >> I am fairly new to Accumulo and am trying to figure out what is >> preventing my system from ingesting data at a faster rate. We have 15 nodes >> running a simple Java program that reads and writes to Accumulo and then >> indexes some data into Solr. The rate of ingest is not scaling linearly >> with the number of nodes that we start up. I have tried increasing several >> parameters including: >> - limit of file descriptors in linux >> - max zookeeper connections >> - tserver.memory.maps.max >> - tserver_opts memory size >> - tserver.mutation_queue.max >> - tserver.scan.files.open.max >> - tserver.walog.max.size >> - tserver.cache.data.size >> - tserver.cache.index.size >> - hdfs setting for xceivers >> No matter what changes we make, we cannot get the ingest rate to go over >> 100k entries/s and about 6 Mb/s. I know Accumulo should be able to ingest >> faster than this. >> Thanks in advance, >> >> Jimmy Lin >> >> > >
