Request for Configuration Help for basic test. Tservers dying and only one tablet being used

Pelton, Aaron A. Tue, 29 Jul 2014 10:09:19 -0700

Hi All,

I am new to Accumulo and I apologize if the answers to my questions are already 
posted somewhere. I've done a fair amount of googling and poking around the 
manuals etc.


I am just doing a simple test with two machines, one producing about 600 
threads on the network to stream simultaneous writes to a rest service, and the 
other producing about 300 threads on the network to perform simultaneous 
queries to a rest service. The rest service has Accumulo API calls in it to 
write out and query data.

I have inherited the following configuration


-          Squirrel Bundle distribution of Accumulo 1.5.0

-          1 Master machine to start and stop Accumulo services on

-          12 data nodes running tservers. The first three of these also 
running the zookeeper instances. And, nodes 4-6 running tracers.

I have noticed the following issues with configuration and changed them as 
follows

-          Changed swapiness to 0 on all nodes

-          Was getting OutOfMemoryExceptions after the above still, and after 
running test for long duration. Thus, increased Java Heap size from 1g to 4g, 
which is still far below the physical ram on the nodes.

-          Increased java heap from 1g to 2g on master node

-          I also increased the following properties

o     <property>

o       <name>tserver.memory.maps.max</name>

o       <value>2G</value>

o     </property>

o

o     <property>

o       <name>tserver.cache.data.size</name>

o       <value>512M</value>

o     </property>

o

o     <property>

o       <name>tserver.cache.index.size</name>

o       <value>512M</value>

o     </property>

-          Changed the ulimit for virtual memory to unlimited

-          Changed the ulimit for files opened to 65536

-          Changed the ulimit for max user processes to 1024

-          A tomcat instance with a server socket accepting up to 1,000 threads 
/ user connections to a rest service that eventually makes a read / write out 
to an Accumulo connector instance.

-          Changed the zookeeper connection limit max to 0 since this is just a 
test environment

-          Noticed that code I had inherited didn't have close calls on the 
scanner objects in the rest service b/c it was originally designed for Accumulo 
1.4 in which there wasn't such an API.

-          This may be wrong, but in an effort to see my ~900 connections 
simultaneously get as much access to db writes/reads for servicing, I up'd some 
thread counts for

o     <property>

o       <name>tserver.server.threads.minimum</name>

o       <value>75</value>

o     </property>

o

o     <property>

o       <name>master.server.threads.minimum</name>

o       <value>300</value>

o     </property>

I have a couple of problems to note:

1.       Ingest speeds seem kinda slow. I would anticipate network overhead but 
not enough to reduce writes to 125 records / sec when each record is only a few 
kB.

a.       I believe this is due to the fact that I'm only seeing one tserver 
primarily active at ingesting, with one tbalet in particular for the table 
receiving the bulk of the data.

b.      I have added pre-splits upon table creation for each letter of the 
alphabet, plus the digits 0-9. As this is a test with a simple loop creating ID 
values, I throw 2 alpha chars randomly in front of the generated number in my 
loop and use that as the ID to distribute hopefully the IDs across tablets for 
this table.  A record ID ingested might look like "bk1234:8876", whereby it has 
random 2 chars, orig ID value, colon, and a timestamp.  Sample pre-splitting: 
(Granted the array could be constructed more gracefully, but for a quick test, 
meh).
        try
        {
            conn.tableOperations().create(TABLE_NAME);

            final SortedSet<Text> sortedSplits = new TreeSet<Text>();
            for (String binPrefix : new String[] { "a", "b", "c", "d", "e", 
"f", "g", "h", "i", "j", "k", "l", "m",
                    "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", 
"z", "1", "2", "3", "4", "5", "6", "7",
                    "8", "9", "0" })
            {
                sortedSplits.add(new Text(binPrefix));
            }
            conn.tableOperations().addSplits(TABLE_NAME, sortedSplits);
        }
        catch (TableExistsException | TableNotFoundException exception)
        {
            LOGGER.warn("Could not create table or sorted splits", exception);
        }

2.       Tservers running on the data node halt after about 4 hours in of 
processing.  I'm attempting to ingest into the billions, hopefully trillions of 
records range.  Generally it is the ones that aren't under load in the 
beginning, until finally the one that is handling the bulk of the load crashes 
typically last. In the beginning, I noticed in the tserver logs the 
OutOfMemoryException, but haven't seen that in the past few runs after the 
memory adjustments. In fact the tserver log doesn't say anything about why it 
stopped.  Also didn't notice anything unusual in the zookeeper log other than 
the occasional CancelledKeyException.

3.       Lastly can anyone approximate with the 12 nodes that I have, what kind 
of ingest speed should I see if things were configured correctly in number of 
records per second based on small record sizes of a few kB. And, is anything 
obviously wrong with the configurations mentioned above that would improve 
throughput?



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
Sincerely,
Aaron Pelton

Request for Configuration Help for basic test. Tservers dying and only one tablet being used

Reply via email to