:-) You're welcome. The maximum twitter tweet rate to date is 33,388 tweets/second.
You ingested data at twice that rate. Not bad. -Eric On Tue, Apr 30, 2013 at 5:43 PM, Terry P. <[email protected]> wrote: > Eric, I'm really disappointed. Rather than writing anything at all > actually, I opted to run the RandomBatchWriter example program. > > It wasn't 35x faster. > > It was 52x faster. > > After all the excellent posts I've seen from you, I really expected a more > precise guestimation from you. ;-) > > Thanks for the gentle nudge to do better than python and the accumulo > shell. At a million rows inserted in 13 seconds, I'm certain the Accumulo > cluster I've set up can certainly handle the 2-5K records per second max we > expect to throw at it. > > Thanks again! > > > > On Tue, Apr 30, 2013 at 1:47 PM, Eric Newton <[email protected]>wrote: > >> I've probably written more python than Java, so I understand. :-) >> >> I've used Jython for scripting tests. In unreleased versions (1.4.4 & >> 1.5.0) the Proxy will let you use the language of your choice. >> >> -Eric >> >> >> >> On Tue, Apr 30, 2013 at 2:43 PM, Terry P. <[email protected]> wrote: >> >>> Hi Eric, >>> Thanks for the info. You've inspired me to dive into it in Java -- I >>> had been using the accumulo shell because I had a python data generation >>> script already in place and it was "faster" that way. But if a small java >>> program is going to be 35x "faster" than that, it makes no sense to bother >>> with the shell! >>> >>> Thanks, >>> Terry >>> >>> >>> On Tue, Apr 30, 2013 at 11:01 AM, Eric Newton <[email protected]>wrote: >>> >>>> There's no need to flush... the shell is flushing after every single >>>> line. >>>> >>>> The flush you are invoking causes a minor compaction. >>>> >>>> If you wrote a quick java program to ingest the data, the data would >>>> load about 35x faster. >>>> >>>> -Eric >>>> >>>> >>>> On Mon, Apr 29, 2013 at 6:40 PM, Terry P. <[email protected]> wrote: >>>> >>>>> Perhaps having a configuration item to limit the size of the >>>>> shell_history.txt file would help avoid this in future? >>>>> >>>>> >>>>> On Mon, Apr 29, 2013 at 5:37 PM, Terry P. <[email protected]> wrote: >>>>> >>>>>> You hit it John -- on the NameNode the shell_history.txt file is >>>>>> 128MB, and same thing on the DataNode that 99% of the data went to due to >>>>>> the key structure. On the other two datanodes it was tiny, and both >>>>>> could >>>>>> login fine (just my luck that the only datanode I tried after the load >>>>>> was >>>>>> the fat one). >>>>>> >>>>>> So is --disable-tab-completion supposed to skip reading the >>>>>> shell_history.txt file? It appears that is not the case with 1.4.2 as it >>>>>> still dies with OOM error. >>>>>> >>>>>> I now see that a better way to go would probably be to use >>>>>> --execute-file switch to read the load file rather than pipe it to the >>>>>> shell. Correct? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Apr 29, 2013 at 5:04 PM, John Vines <[email protected]> wrote: >>>>>> >>>>>>> Depending on your answer to Eric's question, I wonder if your >>>>>>> history is enough to blow it up. You may also want to check the size of >>>>>>> ~/.accumulo/shell_history.txt and see if that is ginormous. >>>>>>> >>>>>>> >>>>>>> On Mon, Apr 29, 2013 at 5:07 PM, Terry P. <[email protected]>wrote: >>>>>>> >>>>>>>> Hi John, >>>>>>>> I attempted to start the shell with --disable-tab-completion but it >>>>>>>> still failed in an identical manner. What is that feature/option? >>>>>>>> >>>>>>>> The ACCUMULO_OTHER_OPTS var was set to "-Xmx256m -Xms64m" via the >>>>>>>> 2GB example config script. I upped the -Xmx256m to 512m and the shell >>>>>>>> started successfully, so thanks! >>>>>>>> >>>>>>>> What would cause the shell to need more than 256m of memory just to >>>>>>>> start? I'd like to understand how to determine an appropriate value >>>>>>>> to set >>>>>>>> ACCUMULO_OTHER_OPTS to. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Terry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Apr 29, 2013 at 2:21 PM, John Vines <[email protected]>wrote: >>>>>>>> >>>>>>>>> The shell gets it's memory config from the accumulo-env file from >>>>>>>>> ACCUMULO_OTHER_OPTS. If, for some reason, the value was low or there >>>>>>>>> was a >>>>>>>>> lot of data being loaded for the tab completion stuff in the shell, it >>>>>>>>> could die. You can try upping that value in the file or try running >>>>>>>>> the >>>>>>>>> shell with "--disable-tab-completion" to see if that helps. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Apr 29, 2013 at 3:02 PM, Terry P. <[email protected]>wrote: >>>>>>>>> >>>>>>>>>> Greetings folks, >>>>>>>>>> I have stood up our 8-node Accumulo 1.4.2 cluster consisting of 3 >>>>>>>>>> ZooKeepers, 1 NameNode (also runs Accumulo Master, Monitor, and GC), >>>>>>>>>> and 3 >>>>>>>>>> DataNodes / TabletServers (Secondary NameNode with Alternate Accumulo >>>>>>>>>> Master process will follow). The initial config files were copied >>>>>>>>>> from the >>>>>>>>>> 2GB/native-standalone directory. >>>>>>>>>> >>>>>>>>>> For a quick test I have a text file I generated to load 500,000 >>>>>>>>>> rows of sample data using the Accumulo shell. For lack of a better >>>>>>>>>> place >>>>>>>>>> to run it this first time, I ran it on the NameNode. The script >>>>>>>>>> performs >>>>>>>>>> flushes every 10,000 records (about 30,000 entries). After the load >>>>>>>>>> finished, when I attempt to login to the Accumulo Shell on the >>>>>>>>>> NameNode, I >>>>>>>>>> get the error: >>>>>>>>>> >>>>>>>>>> [root@edib-namenode ~]# /usr/lib/accumulo/bin/accumulo shell -u >>>>>>>>>> $AUSER -p $AUSERPWD >>>>>>>>>> # >>>>>>>>>> # java.lang.OutOfMemoryError: Java heap space >>>>>>>>>> # -XX:OnOutOfMemoryError="kill -9 %p" >>>>>>>>>> # Executing /bin/sh -c "kill -9 24899"... >>>>>>>>>> Killed >>>>>>>>>> >>>>>>>>>> The performance of that test was pretty poor at about 160/second >>>>>>>>>> (somewhat expected, as it was just one thread) so to keep moving I >>>>>>>>>> generated 3 different load files and ran one on each of the 3 >>>>>>>>>> DataNodes / >>>>>>>>>> TabletServers. Performance was much better, sustaining 1,400 per >>>>>>>>>> second. >>>>>>>>>> Again, the test data load files have flush commands every 10,000 >>>>>>>>>> records >>>>>>>>>> (30,000 entries), including at the end of the file. >>>>>>>>>> >>>>>>>>>> However, as with the NameNode, now I cannot login to the Accumulo >>>>>>>>>> shell on any of the DataNodes either, as I get the same >>>>>>>>>> OutOfMemoryError. >>>>>>>>>> >>>>>>>>>> My /etc/security/limits.conf file is set with 64000 for nofile >>>>>>>>>> and 32000 for nproc for the hdfs user (which is also running >>>>>>>>>> Accumulo, I >>>>>>>>>> haven't split accumulo out yet). >>>>>>>>>> >>>>>>>>>> I don't see any errors in the tserver or logger logs (standard >>>>>>>>>> and debug) or any info related to the shell failing to load. I'm at >>>>>>>>>> a loss >>>>>>>>>> with respect to where to look. The servers have 16GB of memory, and >>>>>>>>>> each >>>>>>>>>> has about 14GB currently free. >>>>>>>>>> >>>>>>>>>> Any help would be greatly appreciated. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Terry >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
