Ok, a bit more info. I set -XX:+HeapDumpOnOutOfMemoryError and took a look at the heap dump. The thread that caused the OOM is reading a column family bloom filter from the CacheableBlockFile. The class taking up the memory is long[] which seems to be consistent with a bloom filter. Does this sound right? Any guidance on settings to tweak related to bloom filters to alleviate this issue?
On Thu, Nov 29, 2012 at 2:24 PM, Anthony Fox <[email protected]> wrote: > Since the scan involves an intersecting iterator, it has to scan the > entire row range. Also, it's not even very many concurrent clients - > between 5 and 10. Should I turn compression off on this table or is that > bad idea in general? > > > On Thu, Nov 29, 2012 at 2:22 PM, Keith Turner <[email protected]> wrote: > >> >> >> On Thu, Nov 29, 2012 at 2:09 PM, Anthony Fox <[email protected]>wrote: >> >>> We're not on 1.4 yet, unfortunately. Are there any config params I can >>> tweak to manipulate the compressor pool? >> >> >> Not that I know of, but its been a while since I looked at that. >> >> >>> >>> >>> On Thu, Nov 29, 2012 at 1:49 PM, Keith Turner <[email protected]> wrote: >>> >>>> >>>> >>>> On Thu, Nov 29, 2012 at 12:20 PM, Anthony Fox <[email protected]>wrote: >>>> >>>>> Compacting down to a single file is not feasible - there's about 70G >>>>> in 255 tablets across 15 tablet servers. Is there another way to tune the >>>>> compressor pool or another mechanism to verify that this is the issue? >>>> >>>> >>>> I suppose another way to test this would be to run a lot of concurrent >>>> scans, but not enough to kill the tserver. Then get a heap dump of the >>>> tserver and see if it contains a lot of 128k or 256k (can not remember >>>> exact size) byte arrays that are referenced by the compressor pool. >>>> >>>> >>>>> >>>>> >>>>> On Thu, Nov 29, 2012 at 12:09 PM, Keith Turner <[email protected]>wrote: >>>>> >>>>>> >>>>>> >>>>>> On Thu, Nov 29, 2012 at 11:14 AM, Anthony Fox >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> I am experiencing some issues running multiple parallel scans >>>>>>> against Accumulo. Running single scans works just fine but when I ramp >>>>>>> up >>>>>>> the number of simultaneous clients, my tablet servers die due to running >>>>>>> out of heap space. I've tried raising max heap to 4G which should be >>>>>>> more >>>>>>> than enough but I still see this error. I've tried with >>>>>>> table.cache.block.enable=false >>>>>>> table.cache.index.enable=false, and table.scan.cache.enable=false >>>>>>> and all combinations of caching enabled as well. >>>>>>> >>>>>>> My scans involve a custom intersecting iterator that maintains no >>>>>>> more state than the top key and value. The scans also do a bit of >>>>>>> aggregation on column qualifiers but the result is small and the number >>>>>>> of >>>>>>> returned entries is only in the dozens. The size of each returned >>>>>>> value is >>>>>>> only around 500 bytes. >>>>>>> >>>>>>> Any ideas why this may be happening or where to look for further >>>>>>> info? >>>>>>> >>>>>> >>>>>> One know issues is hadoops compressor pool. If you have a tablet >>>>>> with 8 files and you query 10 terms, you will allocate 80 decompressors. >>>>>> Each decompressor uses 128K. If you have 10 concurrent queries, 10 >>>>>> terms, >>>>>> and 10 files then you will allocate 1000 decompressors. These >>>>>> decompressors come from a pool that never shrinks. So if you allocate >>>>>> 1000 >>>>>> at the same time, they will stay around. >>>>>> >>>>>> Try compacting your table down to one file and rerun your query just >>>>>> to see if that helps. If it does, then thats an important clue. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Anthony >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
