Skip 00, nothing would come before it. :-)
On Tue, Feb 11, 2014 at 12:34 PM, Mike Drob <[email protected]> wrote: > For uuid4 keys, you might want to do [00, 01, 02, ..., 0e, 0f, 10, ..., > fd, fe, ff] to cover the full range. > > > On Tue, Feb 11, 2014 at 9:16 AM, Josh Elser <[email protected]> wrote: > >> Ok. Even so, try adding some split points to the tables before you begin >> (if you aren't already) as it will *greatly* smooth the startup. >> >> Something like [00, 01, 02, ... 10, 11, 12, .. 97, 98, 99] would be good. >> You can easily dump this to a file on local disk and run the `addsplits` >> command in the Accumulo shell and provide it that file with the -sf (I >> think) option. >> >> >> On 2/11/14, 12:00 PM, Diego Woitasen wrote: >> >>> I'm using random keys for this tests. They are uuid4 keys. >>> >>> On Tue, Feb 11, 2014 at 1:04 PM, Josh Elser <[email protected]> >>> wrote: >>> >>>> The other thing I thought about.. what's the distribution of Key-Values >>>> that you're writing? Specifically, do many of the Keys sort "near" each >>>> other. Similarly, do you notice excessive load on some tservers, but >>>> not all >>>> (the "Tablet Servers" page on the Monitor is a good check)? >>>> >>>> Consider the following: you have 10 tservers and you have 10 proxy >>>> servers. >>>> The first thought is that 10 tservers should be plenty to balance the >>>> load >>>> of those 10 proxy servers. However, a problem arises when if the data >>>> that >>>> each of those proxy servers is writing happens to reside on a _small >>>> number >>>> of tablet servers_. Thus, your 10 proxy servers might only be writing >>>> to one >>>> or two tabletservers. >>>> >>>> If you notice that you're getting skew like this (or even just know that >>>> you're apt to have a situation where multiple clients might write data >>>> that >>>> sorts close to one another), it would be a good idea to add splits to >>>> your >>>> table before starting your workload. >>>> >>>> e.g. if you consider that your Key-space is the numbers from 1 to 10, >>>> and >>>> you have ten tservers, it would be a good idea to add splits 1, 2, ... >>>> 10, >>>> so that each tservers hosts at least one tablet (e.g. [1,2), [2,3)... >>>> [10,+inf)). Having at least 5 or 10 tablets per tserver per table (split >>>> according to the distribution of your data) might help ease the load. >>>> >>>> >>>> On 2/11/14, 10:47 AM, Diego Woitasen wrote: >>>> >>>>> >>>>> Same results with 2G tserver.memory.maps.max. >>>>> >>>>> May be we just reached the limit :) >>>>> >>>>> On Mon, Feb 10, 2014 at 7:08 PM, Diego Woitasen >>>>> <[email protected]> wrote: >>>>> >>>>>> >>>>>> On Mon, Feb 10, 2014 at 6:21 PM, Josh Elser <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> I assume you're running a datanode along side the tserver on that >>>>>>> node? >>>>>>> That >>>>>>> may be stretching the capabilities of that node (not to mention ec2 >>>>>>> nodes >>>>>>> tend to be a little flakey in general). 2G for the >>>>>>> tserver.memory.maps.max >>>>>>> might be a little safer. >>>>>>> >>>>>>> You got an error in a tserver log about that IOException in >>>>>>> internalReader. >>>>>>> After that, the tserver was still alive? And the proxy client was >>>>>>> dead - >>>>>>> quit normally? >>>>>>> >>>>>> >>>>>> >>>>>> Yes, everything is still alive. >>>>>> >>>>>> >>>>>>> If that's the case, the proxy might just be disconnecting in a noisy >>>>>>> manner? >>>>>>> >>>>>> >>>>>> >>>>>> Right! >>>>>> >>>>>> I'll try with 2G tserver.memory.maps.max. >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2/10/14, 3:38 PM, Diego Woitasen wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> I tried increasing the tserver.memory.maps.max to 3G and failed >>>>>>>> again, but with other error. I have a heap size of 3G and 7.5 GB of >>>>>>>> total ram. >>>>>>>> >>>>>>>> The error that I've found in the crashed tserver is: >>>>>>>> >>>>>>>> 2014-02-08 03:37:35,497 [util.TServerUtils$THsHaServer] WARN : Got >>>>>>>> an >>>>>>>> IOException in internalRead! >>>>>>>> >>>>>>>> The tserver haven't crashed, but the client was disconnected during >>>>>>>> the >>>>>>>> test. >>>>>>>> >>>>>>>> Another hint is welcome :) >>>>>>>> >>>>>>>> On Mon, Feb 3, 2014 at 3:58 PM, Josh Elser <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Oh, ok. So that isn't quite as bad as it seems. >>>>>>>>> >>>>>>>>> The "commits are held" exception is thrown when the tserver is >>>>>>>>> running >>>>>>>>> low >>>>>>>>> on memory. The tserver will block new mutations coming in until it >>>>>>>>> can >>>>>>>>> process the ones it already has and free up some memory. This makes >>>>>>>>> sense >>>>>>>>> that you would see this more often when you have more proxy >>>>>>>>> servers as >>>>>>>>> the >>>>>>>>> total amount of Mutations you can send to your Accumulo instance is >>>>>>>>> increased. With one proxy server, your tserver had enough memory to >>>>>>>>> process >>>>>>>>> the incoming data. With many proxy servers, your tservers would >>>>>>>>> likely >>>>>>>>> fall >>>>>>>>> over eventually because they'll get bogged down in JVM garbage >>>>>>>>> collection. >>>>>>>>> >>>>>>>>> If you have more memory that you can give the tservers, that would >>>>>>>>> help. >>>>>>>>> Also, you should make sure that you're using the Accumulo native >>>>>>>>> maps >>>>>>>>> as >>>>>>>>> this will use off-JVM-heap space instead of JVM heap which should >>>>>>>>> help >>>>>>>>> tremendously with your ingest rates. >>>>>>>>> >>>>>>>>> Native maps should be on by default unless you turned them off >>>>>>>>> using >>>>>>>>> the >>>>>>>>> property 'tserver.memory.maps.native.enabled' in >>>>>>>>> accumulo-site.xml. >>>>>>>>> Additionally, you can try increasing the size of the native maps >>>>>>>>> using >>>>>>>>> 'tserver.memory.maps.max' in accumulo-site.xml. Just be aware that >>>>>>>>> with >>>>>>>>> the >>>>>>>>> native maps, you need to ensure that total_ram > JVM_heap + >>>>>>>>> tserver.memory.maps.max >>>>>>>>> >>>>>>>>> - Josh >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2/3/14, 1:33 PM, Diego Woitasen wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I've launched the cluster again and I was able to reproduce the >>>>>>>>>> error: >>>>>>>>>> >>>>>>>>>> In the proxy I had the same error that I mention in one of my >>>>>>>>>> previous >>>>>>>>>> messages, about a failure in a table server. I checked the log of >>>>>>>>>> that >>>>>>>>>> tablet server and I found: >>>>>>>>>> >>>>>>>>>> 2014-02-03 18:02:24,065 [thrift.ProcessFunction] ERROR: Internal >>>>>>>>>> error >>>>>>>>>> processing update >>>>>>>>>> org.apache.accumulo.server.tabletserver.HoldTimeoutException: >>>>>>>>>> Commits >>>>>>>>>> are >>>>>>>>>> held >>>>>>>>>> >>>>>>>>>> A lot of times. >>>>>>>>>> >>>>>>>>>> Full log if someone want to have a look: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://www.vhgroup.net/diegows/tserver_matrix-slave- >>>>>>>>>> 07.accumulo-ec2-test.com.debug.log >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Diego >>>>>>>>>> >>>>>>>>>> On Mon, Feb 3, 2014 at 12:11 PM, Josh Elser <[email protected] >>>>>>>>>> > >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I would assume that that proxy service would become a bottleneck >>>>>>>>>>> fairly >>>>>>>>>>> quickly and your throughput would benefit from running multiple >>>>>>>>>>> proxies, >>>>>>>>>>> but I don't have substantive numbers to back up that assertion. >>>>>>>>>>> >>>>>>>>>>> I'll put this on my list and see if I can reproduce something. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2/3/14, 7:42 AM, Diego Woitasen wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I have to run the tests again because they were ec2 instances >>>>>>>>>>>> and >>>>>>>>>>>> I've >>>>>>>>>>>> destroyed. It's easy to reproduce BTW. >>>>>>>>>>>> >>>>>>>>>>>> My question is, does it makes sense to run multiple proxies? Are >>>>>>>>>>>> there >>>>>>>>>>>> a limit? Right now I'm trying with 10 nodes and 10 proxies >>>>>>>>>>>> (running >>>>>>>>>>>> on >>>>>>>>>>>> every node). May be that doesn't make sense or it's a buggy >>>>>>>>>>>> configuration. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jan 31, 2014 at 7:29 PM, Josh Elser < >>>>>>>>>>>> [email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> When you had multiple proxies, what were the failures on that >>>>>>>>>>>>> tablet >>>>>>>>>>>>> server >>>>>>>>>>>>> (10.202.6.46:9997). >>>>>>>>>>>>> >>>>>>>>>>>>> I'm curious why using one proxy didn't cause errors but >>>>>>>>>>>>> multiple >>>>>>>>>>>>> did. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 1/31/14, 4:44 PM, Diego Woitasen wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've reproduced the error and I've found this in the proxy >>>>>>>>>>>>>> logs: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2014-01-31 19:47:50,430 [server.THsHaServer] WARN : >>>>>>>>>>>>>> Got >>>>>>>>>>>>>> an >>>>>>>>>>>>>> IOException in internalRead! >>>>>>>>>>>>>> java.io.IOException: Connection reset by peer >>>>>>>>>>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native >>>>>>>>>>>>>> Method) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) >>>>>>>>>>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:197) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> org.apache.thrift.transport.TNonblockingSocket.read( >>>>>>>>>>>>>> TNonblockingSocket.java:141) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$ >>>>>>>>>>>>>> FrameBuffer.internalRead(AbstractNonblockingServer.java:515) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$ >>>>>>>>>>>>>> FrameBuffer.read(AbstractNonblockingServer.java:305) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$ >>>>>>>>>>>>>> AbstractSelectThread.handleRead(AbstractNonblockingServer. >>>>>>>>>>>>>> java:202) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> org.apache.thrift.server.TNonblockingServer$ >>>>>>>>>>>>>> SelectAcceptThread.select(TNonblockingServer.java:198) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> org.apache.thrift.server.TNonblockingServer$ >>>>>>>>>>>>>> SelectAcceptThread.run(TNonblockingServer.java:154) >>>>>>>>>>>>>> 2014-01-31 19:51:13,185 [impl.ThriftTransportPool] >>>>>>>>>>>>>> WARN >>>>>>>>>>>>>> : >>>>>>>>>>>>>> Server >>>>>>>>>>>>>> 10.202.6.46:9997:9997 (30000) had 20 failures in a short time >>>>>>>>>>>>>> period, >>>>>>>>>>>>>> will not complain anymore >>>>>>>>>>>>>> >>>>>>>>>>>>>> A lot of this messages appear in all the proxies. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I tried the same stress tests agaisnt one proxy and I was >>>>>>>>>>>>>> able to >>>>>>>>>>>>>> increase the load without getting any error. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Diego >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jan 30, 2014 at 2:47 PM, Keith Turner < >>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Do you see more information in the proxy logs? "# >>>>>>>>>>>>>>> exceptions 1" >>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>> an unexpected exception occured in the batch writer client >>>>>>>>>>>>>>> code. >>>>>>>>>>>>>>> The >>>>>>>>>>>>>>> proxy >>>>>>>>>>>>>>> uses this client code, so maybe there will be a more detailed >>>>>>>>>>>>>>> stack >>>>>>>>>>>>>>> trace >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> its logs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jan 30, 2014 at 9:46 AM, Diego Woitasen >>>>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> I'm testing with a ten node cluster with the proxy >>>>>>>>>>>>>>>> enabled in >>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> nodes. I'm doing a stress test balancing the connection >>>>>>>>>>>>>>>> between >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> proxies using round robin. When I increase the load (400 >>>>>>>>>>>>>>>> workers >>>>>>>>>>>>>>>> writting) I get this error: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> AccumuloSecurityException: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core. >>>>>>>>>>>>>>>> client.MutationsRejectedException: >>>>>>>>>>>>>>>> # constraint violations : 0 security codes: [] # server >>>>>>>>>>>>>>>> errors 0 >>>>>>>>>>>>>>>> # >>>>>>>>>>>>>>>> exceptions 1') >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The complete message is: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> AccumuloSecurityException: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core. >>>>>>>>>>>>>>>> client.MutationsRejectedException: >>>>>>>>>>>>>>>> # constraint violations : 0 security codes: [] # server >>>>>>>>>>>>>>>> errors 0 >>>>>>>>>>>>>>>> # >>>>>>>>>>>>>>>> exceptions 1') >>>>>>>>>>>>>>>> kvlayer-test client failed! >>>>>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>>>>> File "tests/kvlayer/test_accumulo_throughput.py", >>>>>>>>>>>>>>>> line >>>>>>>>>>>>>>>> 64, >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>> __call__ >>>>>>>>>>>>>>>> self.client.put('t1', ((u,), self.one_mb)) >>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site- >>>>>>>>>>>>>>>> packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_decorators.py", >>>>>>>>>>>>>>>> line 26, in wrapper >>>>>>>>>>>>>>>> return method(*args, **kwargs) >>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site- >>>>>>>>>>>>>>>> packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_accumulo.py", >>>>>>>>>>>>>>>> line 154, in put >>>>>>>>>>>>>>>> batch_writer.close() >>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site- >>>>>>>>>>>>>>>> packages/pyaccumulo_dev-1.5.0. >>>>>>>>>>>>>>>> 2-py2.7.egg/pyaccumulo/__init__.py", >>>>>>>>>>>>>>>> line 126, in close >>>>>>>>>>>>>>>> self._conn.client.closeWriter(self._writer) >>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site- >>>>>>>>>>>>>>>> packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/ >>>>>>>>>>>>>>>> AccumuloProxy.py", >>>>>>>>>>>>>>>> line 3149, in closeWriter >>>>>>>>>>>>>>>> self.recv_closeWriter() >>>>>>>>>>>>>>>> File >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site- >>>>>>>>>>>>>>>> packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/ >>>>>>>>>>>>>>>> AccumuloProxy.py", >>>>>>>>>>>>>>>> line 3172, in recv_closeWriter >>>>>>>>>>>>>>>> raise result.ouch2 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm not sure if the errror is produced by the way I'm using >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> cluster with multiple proxies, may be I should use one. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ideas are welcome. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Diego >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Diego Woitasen >>>>>>>>>>>>>>>> VHGroup - Linux and Open Source solutions architect >>>>>>>>>>>>>>>> www.vhgroup.net >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Diego Woitasen >>>>>> VHGroup - Linux and Open Source solutions architect >>>>>> www.vhgroup.net >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >
