I've probably written more python than Java, so I understand. :-) I've used Jython for scripting tests. In unreleased versions (1.4.4 & 1.5.0) the Proxy will let you use the language of your choice.
-Eric On Tue, Apr 30, 2013 at 2:43 PM, Terry P. <[email protected]> wrote: > Hi Eric, > Thanks for the info. You've inspired me to dive into it in Java -- I had > been using the accumulo shell because I had a python data generation script > already in place and it was "faster" that way. But if a small java program > is going to be 35x "faster" than that, it makes no sense to bother with the > shell! > > Thanks, > Terry > > > On Tue, Apr 30, 2013 at 11:01 AM, Eric Newton <[email protected]>wrote: > >> There's no need to flush... the shell is flushing after every single line. >> >> The flush you are invoking causes a minor compaction. >> >> If you wrote a quick java program to ingest the data, the data would load >> about 35x faster. >> >> -Eric >> >> >> On Mon, Apr 29, 2013 at 6:40 PM, Terry P. <[email protected]> wrote: >> >>> Perhaps having a configuration item to limit the size of the >>> shell_history.txt file would help avoid this in future? >>> >>> >>> On Mon, Apr 29, 2013 at 5:37 PM, Terry P. <[email protected]> wrote: >>> >>>> You hit it John -- on the NameNode the shell_history.txt file is 128MB, >>>> and same thing on the DataNode that 99% of the data went to due to the key >>>> structure. On the other two datanodes it was tiny, and both could login >>>> fine (just my luck that the only datanode I tried after the load was the >>>> fat one). >>>> >>>> So is --disable-tab-completion supposed to skip reading the >>>> shell_history.txt file? It appears that is not the case with 1.4.2 as it >>>> still dies with OOM error. >>>> >>>> I now see that a better way to go would probably be to use >>>> --execute-file switch to read the load file rather than pipe it to the >>>> shell. Correct? >>>> >>>> >>>> >>>> >>>> On Mon, Apr 29, 2013 at 5:04 PM, John Vines <[email protected]> wrote: >>>> >>>>> Depending on your answer to Eric's question, I wonder if your history >>>>> is enough to blow it up. You may also want to check the size of >>>>> ~/.accumulo/shell_history.txt and see if that is ginormous. >>>>> >>>>> >>>>> On Mon, Apr 29, 2013 at 5:07 PM, Terry P. <[email protected]> wrote: >>>>> >>>>>> Hi John, >>>>>> I attempted to start the shell with --disable-tab-completion but it >>>>>> still failed in an identical manner. What is that feature/option? >>>>>> >>>>>> The ACCUMULO_OTHER_OPTS var was set to "-Xmx256m -Xms64m" via the 2GB >>>>>> example config script. I upped the -Xmx256m to 512m and the shell >>>>>> started >>>>>> successfully, so thanks! >>>>>> >>>>>> What would cause the shell to need more than 256m of memory just to >>>>>> start? I'd like to understand how to determine an appropriate value to >>>>>> set >>>>>> ACCUMULO_OTHER_OPTS to. >>>>>> >>>>>> Thanks, >>>>>> Terry >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Apr 29, 2013 at 2:21 PM, John Vines <[email protected]> wrote: >>>>>> >>>>>>> The shell gets it's memory config from the accumulo-env file from >>>>>>> ACCUMULO_OTHER_OPTS. If, for some reason, the value was low or there >>>>>>> was a >>>>>>> lot of data being loaded for the tab completion stuff in the shell, it >>>>>>> could die. You can try upping that value in the file or try running the >>>>>>> shell with "--disable-tab-completion" to see if that helps. >>>>>>> >>>>>>> >>>>>>> On Mon, Apr 29, 2013 at 3:02 PM, Terry P. <[email protected]>wrote: >>>>>>> >>>>>>>> Greetings folks, >>>>>>>> I have stood up our 8-node Accumulo 1.4.2 cluster consisting of 3 >>>>>>>> ZooKeepers, 1 NameNode (also runs Accumulo Master, Monitor, and GC), >>>>>>>> and 3 >>>>>>>> DataNodes / TabletServers (Secondary NameNode with Alternate Accumulo >>>>>>>> Master process will follow). The initial config files were copied >>>>>>>> from the >>>>>>>> 2GB/native-standalone directory. >>>>>>>> >>>>>>>> For a quick test I have a text file I generated to load 500,000 >>>>>>>> rows of sample data using the Accumulo shell. For lack of a better >>>>>>>> place >>>>>>>> to run it this first time, I ran it on the NameNode. The script >>>>>>>> performs >>>>>>>> flushes every 10,000 records (about 30,000 entries). After the load >>>>>>>> finished, when I attempt to login to the Accumulo Shell on the >>>>>>>> NameNode, I >>>>>>>> get the error: >>>>>>>> >>>>>>>> [root@edib-namenode ~]# /usr/lib/accumulo/bin/accumulo shell -u >>>>>>>> $AUSER -p $AUSERPWD >>>>>>>> # >>>>>>>> # java.lang.OutOfMemoryError: Java heap space >>>>>>>> # -XX:OnOutOfMemoryError="kill -9 %p" >>>>>>>> # Executing /bin/sh -c "kill -9 24899"... >>>>>>>> Killed >>>>>>>> >>>>>>>> The performance of that test was pretty poor at about 160/second >>>>>>>> (somewhat expected, as it was just one thread) so to keep moving I >>>>>>>> generated 3 different load files and ran one on each of the 3 >>>>>>>> DataNodes / >>>>>>>> TabletServers. Performance was much better, sustaining 1,400 per >>>>>>>> second. >>>>>>>> Again, the test data load files have flush commands every 10,000 >>>>>>>> records >>>>>>>> (30,000 entries), including at the end of the file. >>>>>>>> >>>>>>>> However, as with the NameNode, now I cannot login to the Accumulo >>>>>>>> shell on any of the DataNodes either, as I get the same >>>>>>>> OutOfMemoryError. >>>>>>>> >>>>>>>> My /etc/security/limits.conf file is set with 64000 for nofile and >>>>>>>> 32000 for nproc for the hdfs user (which is also running Accumulo, I >>>>>>>> haven't split accumulo out yet). >>>>>>>> >>>>>>>> I don't see any errors in the tserver or logger logs (standard and >>>>>>>> debug) or any info related to the shell failing to load. I'm at a loss >>>>>>>> with respect to where to look. The servers have 16GB of memory, and >>>>>>>> each >>>>>>>> has about 14GB currently free. >>>>>>>> >>>>>>>> Any help would be greatly appreciated. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Terry >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
