On Mon, Feb 10, 2014 at 6:21 PM, Josh Elser <[email protected]> wrote: > I assume you're running a datanode along side the tserver on that node? That > may be stretching the capabilities of that node (not to mention ec2 nodes > tend to be a little flakey in general). 2G for the tserver.memory.maps.max > might be a little safer. > > You got an error in a tserver log about that IOException in internalReader. > After that, the tserver was still alive? And the proxy client was dead - > quit normally?
Yes, everything is still alive. > > If that's the case, the proxy might just be disconnecting in a noisy manner? Right! I'll try with 2G tserver.memory.maps.max. > > > On 2/10/14, 3:38 PM, Diego Woitasen wrote: >> >> Hi, >> I tried increasing the tserver.memory.maps.max to 3G and failed >> again, but with other error. I have a heap size of 3G and 7.5 GB of >> total ram. >> >> The error that I've found in the crashed tserver is: >> >> 2014-02-08 03:37:35,497 [util.TServerUtils$THsHaServer] WARN : Got an >> IOException in internalRead! >> >> The tserver haven't crashed, but the client was disconnected during the >> test. >> >> Another hint is welcome :) >> >> On Mon, Feb 3, 2014 at 3:58 PM, Josh Elser <[email protected]> wrote: >>> >>> Oh, ok. So that isn't quite as bad as it seems. >>> >>> The "commits are held" exception is thrown when the tserver is running >>> low >>> on memory. The tserver will block new mutations coming in until it can >>> process the ones it already has and free up some memory. This makes sense >>> that you would see this more often when you have more proxy servers as >>> the >>> total amount of Mutations you can send to your Accumulo instance is >>> increased. With one proxy server, your tserver had enough memory to >>> process >>> the incoming data. With many proxy servers, your tservers would likely >>> fall >>> over eventually because they'll get bogged down in JVM garbage >>> collection. >>> >>> If you have more memory that you can give the tservers, that would help. >>> Also, you should make sure that you're using the Accumulo native maps as >>> this will use off-JVM-heap space instead of JVM heap which should help >>> tremendously with your ingest rates. >>> >>> Native maps should be on by default unless you turned them off using the >>> property 'tserver.memory.maps.native.enabled' in accumulo-site.xml. >>> Additionally, you can try increasing the size of the native maps using >>> 'tserver.memory.maps.max' in accumulo-site.xml. Just be aware that with >>> the >>> native maps, you need to ensure that total_ram > JVM_heap + >>> tserver.memory.maps.max >>> >>> - Josh >>> >>> >>> On 2/3/14, 1:33 PM, Diego Woitasen wrote: >>>> >>>> >>>> I've launched the cluster again and I was able to reproduce the error: >>>> >>>> In the proxy I had the same error that I mention in one of my previous >>>> messages, about a failure in a table server. I checked the log of that >>>> tablet server and I found: >>>> >>>> 2014-02-03 18:02:24,065 [thrift.ProcessFunction] ERROR: Internal error >>>> processing update >>>> org.apache.accumulo.server.tabletserver.HoldTimeoutException: Commits >>>> are >>>> held >>>> >>>> A lot of times. >>>> >>>> Full log if someone want to have a look: >>>> >>>> >>>> http://www.vhgroup.net/diegows/tserver_matrix-slave-07.accumulo-ec2-test.com.debug.log >>>> >>>> Regards, >>>> Diego >>>> >>>> On Mon, Feb 3, 2014 at 12:11 PM, Josh Elser <[email protected]> >>>> wrote: >>>>> >>>>> >>>>> I would assume that that proxy service would become a bottleneck fairly >>>>> quickly and your throughput would benefit from running multiple >>>>> proxies, >>>>> but I don't have substantive numbers to back up that assertion. >>>>> >>>>> I'll put this on my list and see if I can reproduce something. >>>>> >>>>> >>>>> On 2/3/14, 7:42 AM, Diego Woitasen wrote: >>>>>> >>>>>> >>>>>> >>>>>> I have to run the tests again because they were ec2 instances and I've >>>>>> destroyed. It's easy to reproduce BTW. >>>>>> >>>>>> My question is, does it makes sense to run multiple proxies? Are there >>>>>> a limit? Right now I'm trying with 10 nodes and 10 proxies (running on >>>>>> every node). May be that doesn't make sense or it's a buggy >>>>>> configuration. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jan 31, 2014 at 7:29 PM, Josh Elser <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> When you had multiple proxies, what were the failures on that tablet >>>>>>> server >>>>>>> (10.202.6.46:9997). >>>>>>> >>>>>>> I'm curious why using one proxy didn't cause errors but multiple did. >>>>>>> >>>>>>> >>>>>>> On 1/31/14, 4:44 PM, Diego Woitasen wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I've reproduced the error and I've found this in the proxy logs: >>>>>>>> >>>>>>>> 2014-01-31 19:47:50,430 [server.THsHaServer] WARN : Got an >>>>>>>> IOException in internalRead! >>>>>>>> java.io.IOException: Connection reset by peer >>>>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >>>>>>>> at >>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>>>>>>> at >>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) >>>>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:197) >>>>>>>> at >>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:515) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:305) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:202) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.select(TNonblockingServer.java:198) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.run(TNonblockingServer.java:154) >>>>>>>> 2014-01-31 19:51:13,185 [impl.ThriftTransportPool] WARN : >>>>>>>> Server >>>>>>>> 10.202.6.46:9997:9997 (30000) had 20 failures in a short time >>>>>>>> period, >>>>>>>> will not complain anymore >>>>>>>> >>>>>>>> A lot of this messages appear in all the proxies. >>>>>>>> >>>>>>>> I tried the same stress tests agaisnt one proxy and I was able to >>>>>>>> increase the load without getting any error. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Diego >>>>>>>> >>>>>>>> On Thu, Jan 30, 2014 at 2:47 PM, Keith Turner <[email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Do you see more information in the proxy logs? "# exceptions 1" >>>>>>>>> indicates >>>>>>>>> an unexpected exception occured in the batch writer client code. >>>>>>>>> The >>>>>>>>> proxy >>>>>>>>> uses this client code, so maybe there will be a more detailed stack >>>>>>>>> trace >>>>>>>>> in >>>>>>>>> its logs. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jan 30, 2014 at 9:46 AM, Diego Woitasen >>>>>>>>> <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> I'm testing with a ten node cluster with the proxy enabled in >>>>>>>>>> all >>>>>>>>>> the >>>>>>>>>> nodes. I'm doing a stress test balancing the connection between >>>>>>>>>> the >>>>>>>>>> proxies using round robin. When I increase the load (400 workers >>>>>>>>>> writting) I get this error: >>>>>>>>>> >>>>>>>>>> AccumuloSecurityException: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException: >>>>>>>>>> # constraint violations : 0 security codes: [] # server errors 0 >>>>>>>>>> # >>>>>>>>>> exceptions 1') >>>>>>>>>> >>>>>>>>>> The complete message is: >>>>>>>>>> >>>>>>>>>> AccumuloSecurityException: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException: >>>>>>>>>> # constraint violations : 0 security codes: [] # server errors 0 >>>>>>>>>> # >>>>>>>>>> exceptions 1') >>>>>>>>>> kvlayer-test client failed! >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> File "tests/kvlayer/test_accumulo_throughput.py", line 64, >>>>>>>>>> in >>>>>>>>>> __call__ >>>>>>>>>> self.client.put('t1', ((u,), self.one_mb)) >>>>>>>>>> File >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_decorators.py", >>>>>>>>>> line 26, in wrapper >>>>>>>>>> return method(*args, **kwargs) >>>>>>>>>> File >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_accumulo.py", >>>>>>>>>> line 154, in put >>>>>>>>>> batch_writer.close() >>>>>>>>>> File >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/__init__.py", >>>>>>>>>> line 126, in close >>>>>>>>>> self._conn.client.closeWriter(self._writer) >>>>>>>>>> File >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py", >>>>>>>>>> line 3149, in closeWriter >>>>>>>>>> self.recv_closeWriter() >>>>>>>>>> File >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py", >>>>>>>>>> line 3172, in recv_closeWriter >>>>>>>>>> raise result.ouch2 >>>>>>>>>> >>>>>>>>>> I'm not sure if the errror is produced by the way I'm using the >>>>>>>>>> cluster with multiple proxies, may be I should use one. >>>>>>>>>> >>>>>>>>>> Ideas are welcome. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Diego >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Diego Woitasen >>>>>>>>>> VHGroup - Linux and Open Source solutions architect >>>>>>>>>> www.vhgroup.net >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> > -- Diego Woitasen VHGroup - Linux and Open Source solutions architect www.vhgroup.net
