Hi Eric, Thanks for the info. You've inspired me to dive into it in Java -- I had been using the accumulo shell because I had a python data generation script already in place and it was "faster" that way. But if a small java program is going to be 35x "faster" than that, it makes no sense to bother with the shell!
Thanks, Terry On Tue, Apr 30, 2013 at 11:01 AM, Eric Newton <[email protected]> wrote: > There's no need to flush... the shell is flushing after every single line. > > The flush you are invoking causes a minor compaction. > > If you wrote a quick java program to ingest the data, the data would load > about 35x faster. > > -Eric > > > On Mon, Apr 29, 2013 at 6:40 PM, Terry P. <[email protected]> wrote: > >> Perhaps having a configuration item to limit the size of the >> shell_history.txt file would help avoid this in future? >> >> >> On Mon, Apr 29, 2013 at 5:37 PM, Terry P. <[email protected]> wrote: >> >>> You hit it John -- on the NameNode the shell_history.txt file is 128MB, >>> and same thing on the DataNode that 99% of the data went to due to the key >>> structure. On the other two datanodes it was tiny, and both could login >>> fine (just my luck that the only datanode I tried after the load was the >>> fat one). >>> >>> So is --disable-tab-completion supposed to skip reading the >>> shell_history.txt file? It appears that is not the case with 1.4.2 as it >>> still dies with OOM error. >>> >>> I now see that a better way to go would probably be to use >>> --execute-file switch to read the load file rather than pipe it to the >>> shell. Correct? >>> >>> >>> >>> >>> On Mon, Apr 29, 2013 at 5:04 PM, John Vines <[email protected]> wrote: >>> >>>> Depending on your answer to Eric's question, I wonder if your history >>>> is enough to blow it up. You may also want to check the size of >>>> ~/.accumulo/shell_history.txt and see if that is ginormous. >>>> >>>> >>>> On Mon, Apr 29, 2013 at 5:07 PM, Terry P. <[email protected]> wrote: >>>> >>>>> Hi John, >>>>> I attempted to start the shell with --disable-tab-completion but it >>>>> still failed in an identical manner. What is that feature/option? >>>>> >>>>> The ACCUMULO_OTHER_OPTS var was set to "-Xmx256m -Xms64m" via the 2GB >>>>> example config script. I upped the -Xmx256m to 512m and the shell started >>>>> successfully, so thanks! >>>>> >>>>> What would cause the shell to need more than 256m of memory just to >>>>> start? I'd like to understand how to determine an appropriate value to >>>>> set >>>>> ACCUMULO_OTHER_OPTS to. >>>>> >>>>> Thanks, >>>>> Terry >>>>> >>>>> >>>>> >>>>> On Mon, Apr 29, 2013 at 2:21 PM, John Vines <[email protected]> wrote: >>>>> >>>>>> The shell gets it's memory config from the accumulo-env file from >>>>>> ACCUMULO_OTHER_OPTS. If, for some reason, the value was low or there was >>>>>> a >>>>>> lot of data being loaded for the tab completion stuff in the shell, it >>>>>> could die. You can try upping that value in the file or try running the >>>>>> shell with "--disable-tab-completion" to see if that helps. >>>>>> >>>>>> >>>>>> On Mon, Apr 29, 2013 at 3:02 PM, Terry P. <[email protected]> wrote: >>>>>> >>>>>>> Greetings folks, >>>>>>> I have stood up our 8-node Accumulo 1.4.2 cluster consisting of 3 >>>>>>> ZooKeepers, 1 NameNode (also runs Accumulo Master, Monitor, and GC), >>>>>>> and 3 >>>>>>> DataNodes / TabletServers (Secondary NameNode with Alternate Accumulo >>>>>>> Master process will follow). The initial config files were copied from >>>>>>> the >>>>>>> 2GB/native-standalone directory. >>>>>>> >>>>>>> For a quick test I have a text file I generated to load 500,000 rows >>>>>>> of sample data using the Accumulo shell. For lack of a better place to >>>>>>> run >>>>>>> it this first time, I ran it on the NameNode. The script performs >>>>>>> flushes >>>>>>> every 10,000 records (about 30,000 entries). After the load finished, >>>>>>> when >>>>>>> I attempt to login to the Accumulo Shell on the NameNode, I get the >>>>>>> error: >>>>>>> >>>>>>> [root@edib-namenode ~]# /usr/lib/accumulo/bin/accumulo shell -u >>>>>>> $AUSER -p $AUSERPWD >>>>>>> # >>>>>>> # java.lang.OutOfMemoryError: Java heap space >>>>>>> # -XX:OnOutOfMemoryError="kill -9 %p" >>>>>>> # Executing /bin/sh -c "kill -9 24899"... >>>>>>> Killed >>>>>>> >>>>>>> The performance of that test was pretty poor at about 160/second >>>>>>> (somewhat expected, as it was just one thread) so to keep moving I >>>>>>> generated 3 different load files and ran one on each of the 3 DataNodes >>>>>>> / >>>>>>> TabletServers. Performance was much better, sustaining 1,400 per >>>>>>> second. >>>>>>> Again, the test data load files have flush commands every 10,000 records >>>>>>> (30,000 entries), including at the end of the file. >>>>>>> >>>>>>> However, as with the NameNode, now I cannot login to the Accumulo >>>>>>> shell on any of the DataNodes either, as I get the same >>>>>>> OutOfMemoryError. >>>>>>> >>>>>>> My /etc/security/limits.conf file is set with 64000 for nofile and >>>>>>> 32000 for nproc for the hdfs user (which is also running Accumulo, I >>>>>>> haven't split accumulo out yet). >>>>>>> >>>>>>> I don't see any errors in the tserver or logger logs (standard and >>>>>>> debug) or any info related to the shell failing to load. I'm at a loss >>>>>>> with respect to where to look. The servers have 16GB of memory, and >>>>>>> each >>>>>>> has about 14GB currently free. >>>>>>> >>>>>>> Any help would be greatly appreciated. >>>>>>> >>>>>>> Best regards, >>>>>>> Terry >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
