Re: Error stressing with pyaccumulo app

Diego Woitasen Tue, 11 Feb 2014 07:49:54 -0800

Same results with 2G tserver.memory.maps.max.

May be we just reached the limit :)


On Mon, Feb 10, 2014 at 7:08 PM, Diego Woitasen
<[email protected]> wrote:
> On Mon, Feb 10, 2014 at 6:21 PM, Josh Elser <[email protected]> wrote:
>> I assume you're running a datanode along side the tserver on that node? That
>> may be stretching the capabilities of that node (not to mention ec2 nodes
>> tend to be a little flakey in general). 2G for the tserver.memory.maps.max
>> might be a little safer.
>>
>> You got an error in a tserver log about that IOException in internalReader.
>> After that, the tserver was still alive? And the proxy client was dead -
>> quit normally?
>
> Yes, everything is still alive.
>
>>
>> If that's the case, the proxy might just be disconnecting in a noisy manner?
>
> Right!
>
> I'll try with 2G  tserver.memory.maps.max.
>>
>>
>> On 2/10/14, 3:38 PM, Diego Woitasen wrote:
>>>
>>> Hi,
>>>   I tried increasing the tserver.memory.maps.max to 3G and failed
>>> again, but with other error. I have a heap size of 3G and 7.5 GB of
>>> total ram.
>>>
>>> The error that I've found in the crashed tserver is:
>>>
>>> 2014-02-08 03:37:35,497 [util.TServerUtils$THsHaServer] WARN : Got an
>>> IOException in internalRead!
>>>
>>> The tserver haven't crashed, but the client was disconnected during the
>>> test.
>>>
>>> Another hint is welcome :)
>>>
>>> On Mon, Feb 3, 2014 at 3:58 PM, Josh Elser <[email protected]> wrote:
>>>>
>>>> Oh, ok. So that isn't quite as bad as it seems.
>>>>
>>>> The "commits are held" exception is thrown when the tserver is running
>>>> low
>>>> on memory. The tserver will block new mutations coming in until it can
>>>> process the ones it already has and free up some memory. This makes sense
>>>> that you would see this more often when you have more proxy servers as
>>>> the
>>>> total amount of Mutations you can send to your Accumulo instance is
>>>> increased. With one proxy server, your tserver had enough memory to
>>>> process
>>>> the incoming data. With many proxy servers, your tservers would likely
>>>> fall
>>>> over eventually because they'll get bogged down in JVM garbage
>>>> collection.
>>>>
>>>> If you have more memory that you can give the tservers, that would help.
>>>> Also, you should make sure that you're using the Accumulo native maps as
>>>> this will use off-JVM-heap space instead of JVM heap which should help
>>>> tremendously with your ingest rates.
>>>>
>>>> Native maps should be on by default unless you turned them off using the
>>>> property 'tserver.memory.maps.native.enabled' in accumulo-site.xml.
>>>> Additionally, you can try increasing the size of the native maps using
>>>> 'tserver.memory.maps.max' in accumulo-site.xml. Just be aware that with
>>>> the
>>>> native maps, you need to ensure that total_ram > JVM_heap +
>>>> tserver.memory.maps.max
>>>>
>>>> - Josh
>>>>
>>>>
>>>> On 2/3/14, 1:33 PM, Diego Woitasen wrote:
>>>>>
>>>>>
>>>>> I've launched the cluster again and I was able to reproduce the error:
>>>>>
>>>>> In the proxy I had the same error that I mention in one of my previous
>>>>> messages, about a failure in a table server. I checked the log of that
>>>>> tablet server and I found:
>>>>>
>>>>> 2014-02-03 18:02:24,065 [thrift.ProcessFunction] ERROR: Internal error
>>>>> processing update
>>>>> org.apache.accumulo.server.tabletserver.HoldTimeoutException: Commits
>>>>> are
>>>>> held
>>>>>
>>>>> A lot of times.
>>>>>
>>>>> Full log if someone want to have a look:
>>>>>
>>>>>
>>>>> http://www.vhgroup.net/diegows/tserver_matrix-slave-07.accumulo-ec2-test.com.debug.log
>>>>>
>>>>> Regards,
>>>>>     Diego
>>>>>
>>>>> On Mon, Feb 3, 2014 at 12:11 PM, Josh Elser <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> I would assume that that proxy service would become a bottleneck fairly
>>>>>> quickly and your throughput would benefit from running multiple
>>>>>> proxies,
>>>>>> but I don't have substantive numbers to back up that assertion.
>>>>>>
>>>>>> I'll put this on my list and see if I can reproduce something.
>>>>>>
>>>>>>
>>>>>> On 2/3/14, 7:42 AM, Diego Woitasen wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I have to run the tests again because they were ec2 instances and I've
>>>>>>> destroyed. It's easy to reproduce BTW.
>>>>>>>
>>>>>>> My question is, does it makes sense to run multiple proxies? Are there
>>>>>>> a limit? Right now I'm trying with 10 nodes and 10 proxies (running on
>>>>>>> every node). May be that doesn't make sense or it's a buggy
>>>>>>> configuration.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 31, 2014 at 7:29 PM, Josh Elser <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> When you had multiple proxies, what were the failures on that tablet
>>>>>>>> server
>>>>>>>> (10.202.6.46:9997).
>>>>>>>>
>>>>>>>> I'm curious why using one proxy didn't cause errors but multiple did.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/31/14, 4:44 PM, Diego Woitasen wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've reproduced the error and I've found this in the proxy logs:
>>>>>>>>>
>>>>>>>>>         2014-01-31 19:47:50,430 [server.THsHaServer] WARN : Got an
>>>>>>>>> IOException in internalRead!
>>>>>>>>>         java.io.IOException: Connection reset by peer
>>>>>>>>>             at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>>>>>>>>             at
>>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>>>>>>>>             at
>>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>>>>>>>>             at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>>>>>>>             at
>>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>>>>>>>>             at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
>>>>>>>>>             at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:515)
>>>>>>>>>             at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:305)
>>>>>>>>>             at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:202)
>>>>>>>>>             at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.select(TNonblockingServer.java:198)
>>>>>>>>>             at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.run(TNonblockingServer.java:154)
>>>>>>>>>         2014-01-31 19:51:13,185 [impl.ThriftTransportPool] WARN :
>>>>>>>>> Server
>>>>>>>>> 10.202.6.46:9997:9997 (30000) had 20 failures in a short time
>>>>>>>>> period,
>>>>>>>>> will not complain anymore
>>>>>>>>>
>>>>>>>>> A lot of this messages appear in all the proxies.
>>>>>>>>>
>>>>>>>>> I tried the same stress tests agaisnt one proxy and I was able to
>>>>>>>>> increase the load without getting any error.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>       Diego
>>>>>>>>>
>>>>>>>>> On Thu, Jan 30, 2014 at 2:47 PM, Keith Turner <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Do you see more information in the proxy logs?  "# exceptions 1"
>>>>>>>>>> indicates
>>>>>>>>>> an unexpected exception occured in the batch writer client code.
>>>>>>>>>> The
>>>>>>>>>> proxy
>>>>>>>>>> uses this client code, so maybe there will be a more detailed stack
>>>>>>>>>> trace
>>>>>>>>>> in
>>>>>>>>>> its logs.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 30, 2014 at 9:46 AM, Diego Woitasen
>>>>>>>>>> <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>      I'm testing with a ten node cluster with the proxy enabled in
>>>>>>>>>>> all
>>>>>>>>>>> the
>>>>>>>>>>> nodes. I'm doing a stress test balancing the connection between
>>>>>>>>>>> the
>>>>>>>>>>> proxies using round robin. When I increase the load (400 workers
>>>>>>>>>>> writting) I get this error:
>>>>>>>>>>>
>>>>>>>>>>> AccumuloSecurityException:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException:
>>>>>>>>>>> # constraint violations : 0  security codes: []  # server errors 0
>>>>>>>>>>> #
>>>>>>>>>>> exceptions 1')
>>>>>>>>>>>
>>>>>>>>>>> The complete message is:
>>>>>>>>>>>
>>>>>>>>>>> AccumuloSecurityException:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException:
>>>>>>>>>>> # constraint violations : 0  security codes: []  # server errors 0
>>>>>>>>>>> #
>>>>>>>>>>> exceptions 1')
>>>>>>>>>>> kvlayer-test client failed!
>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>       File "tests/kvlayer/test_accumulo_throughput.py", line 64,
>>>>>>>>>>> in
>>>>>>>>>>> __call__
>>>>>>>>>>>         self.client.put('t1', ((u,), self.one_mb))
>>>>>>>>>>>       File
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_decorators.py",
>>>>>>>>>>> line 26, in wrapper
>>>>>>>>>>>         return method(*args, **kwargs)
>>>>>>>>>>>       File
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_accumulo.py",
>>>>>>>>>>> line 154, in put
>>>>>>>>>>>         batch_writer.close()
>>>>>>>>>>>       File
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/__init__.py",
>>>>>>>>>>> line 126, in close
>>>>>>>>>>>         self._conn.client.closeWriter(self._writer)
>>>>>>>>>>>       File
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py",
>>>>>>>>>>> line 3149, in closeWriter
>>>>>>>>>>>         self.recv_closeWriter()
>>>>>>>>>>>       File
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py",
>>>>>>>>>>> line 3172, in recv_closeWriter
>>>>>>>>>>>         raise result.ouch2
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure if the errror is produced by the way I'm using the
>>>>>>>>>>> cluster with multiple proxies, may be I should use one.
>>>>>>>>>>>
>>>>>>>>>>> Ideas are welcome.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>       Diego
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Diego Woitasen
>>>>>>>>>>> VHGroup - Linux and Open Source solutions architect
>>>>>>>>>>> www.vhgroup.net
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
> --
> Diego Woitasen
> VHGroup - Linux and Open Source solutions architect
> www.vhgroup.net



-- 
Diego Woitasen
VHGroup - Linux and Open Source solutions architect
www.vhgroup.net

Re: Error stressing with pyaccumulo app

Reply via email to