Hi Todd, Can you please give the URL of this fix?
Thanks, Sean On Sat, Nov 13, 2010 at 9:10 PM, Todd Lipcon <[email protected]> wrote: > Hi Friso, > > I think I identified the issue. As you suspected, we were unnecessarily > allocating a lot of native byte buffers in the LZO code where we weren't > before. > > I just pushed a fix to my LZO repository and bumped the version number to > 0.4.7. > > If you have a chance to test this on a dev environment that would be great. > I will try to test myself this week. (unfortunately I wasn't able to > reproduce the issue yet) > > Thanks > -Todd > > On Fri, Nov 12, 2010 at 4:09 PM, Todd Lipcon <[email protected]> wrote: > > > Hey Friso, > > > > Thanks so much for the details. I am starting to imagine it could indeed > be > > a codec leak - especially since you have some cells which are into the > MB, > > maybe it's expanding some buffers to 64MB. > > > > Let me try to do some tests to reproduce it here in the next week or so. > > > > Anyone else seen this issue? > > > > Thanks > > -Todd > > > > On Fri, Nov 12, 2010 at 1:19 AM, Friso van Vollenhoven < > > [email protected]> wrote: > > > >> Hi Todd, > >> > >> I am afraid I no longer have the broken setup around, because we really > >> need a working one right now. We need to demo at a conference next week > and > >> until after that, all changes are frozen both on dev and prod (so we can > use > >> dev as fall back). Later on I could maybe try some more things on our > dev > >> boxes. > >> > >> If you are doing a repro, here's the stuff you'd probably want to know: > >> The workload is write only. No reads happening at the same time. No > other > >> active clients. It is an initial import of data. We do insertions in a > MR > >> job from the reducers. The total volume is about 11 billion puts across > >> roughly 450K rows per table (we have a many columns per row data model) > >> across 15 tables, all use LZO. Qualifiers are some 50 bytes. Values > range > >> from a small number of KBs generally to MBs in rare cases. The row keys > have > >> a time-related part at the start, so I know the keyspace in advance, so > I > >> create the empty tables with pre-created regions (40 regions) across the > >> keyspace to get decent distribution from the start of the job. In order > to > >> not overload HBase, I run the job with only 15 reducers, so there are > max 15 > >> concurrent clients active. Other settnigs: max file size is 1GB, HFile > block > >> size is default 64K, client side buffer is 16M, memstore flush size is > 128M, > >> compaction threshold is 5, blocking store files is 9, mem store upper > limit > >> is 20%, lower limit 15%, block cache 40%. During the run, the RSes never > >> report more than 5GB of heap usage from the UI, which makes sense, > because > >> block cache is not touched. On a healthy run with somewhat conservative > >> settings right now, HBase reports on average about 380K requests per > second > >> in the master UI. > >> > >> The cluster has 8 workers running TT, DN, RS and another JVM process for > >> our own software that sits in front of HBase. Workers are dual quad > cores > >> with 64GB RAM and 10x 600GB disks (we decided to scale the amount of > seeks > >> we can do concurrently). Disks are quite fast: 10K RPM. MR task VMs get > 1GB > >> of heap, TT and DN also. RS gets 16GB of heap and our own software too. > We > >> run 8 mappers and 4 reducers per node. So at absolute max, we should > have > >> 46GB of allocated heap. That leaves 18GB for JVM overhead, native > >> allocations and OS. We run Linux 2.6.18-194.11.4.el5. I think it is > CentOS, > >> but I didn't do the installs myself. > >> > >> I tried numerous different settings both more extreme and more > >> conservative to get the thing working, but in the end it always ends up > >> swapping. I should have tried a run without LZO, of course, but I was > out of > >> time by then. > >> > >> > >> > >> Cheers, > >> Friso > >> > >> > >> > >> On 12 nov 2010, at 07:06, Todd Lipcon wrote: > >> > >> > Hrm, any chance you can run with a smaller heap and get a jmap dump? > The > >> > eclipse MAT tool is also super nice for looking at this stuff if > indeed > >> they > >> > are java objects. > >> > > >> > What kind of workload are you using? Read mostly? Write mostly? Mixed? > I > >> > will try to repro. > >> > > >> > -Todd > >> > > >> > On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven < > >> > [email protected]> wrote: > >> > > >> >> I figured the same. I also did a run with CMS instead of G1. Same > >> results. > >> >> > >> >> I also did a run with the RS heap tuned down to 12GB and 8GB, but > given > >> >> enough time the process still grows over 40GB in size. > >> >> > >> >> > >> >> Friso > >> >> > >> >> > >> >> > >> >> On 12 nov 2010, at 01:55, Todd Lipcon wrote: > >> >> > >> >>> Can you try running this with CMS GC instead of G1GC? G1 still has > >> some > >> >>> bugs... 64M sounds like it might be G1 "regions"? > >> >>> > >> >>> -Todd > >> >>> > >> >>> On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven < > >> >>> [email protected]> wrote: > >> >>> > >> >>>> Hi All, > >> >>>> > >> >>>> (This is all about CDH3, so I am not sure whether it should go on > >> this > >> >>>> list, but I figure it is at least interesting for people trying the > >> >> same.) > >> >>>> > >> >>>> I've recently tried CDH3 on a new cluster from RPMs with the > >> hadoop-lzo > >> >>>> fork from https://github.com/toddlipcon/hadoop-lzo. Everything > works > >> >> like > >> >>>> a charm initially, but after some time (minutes to max one hour), > the > >> RS > >> >> JVM > >> >>>> process memory grows to more than twice the given heap size and > >> beyond. > >> >> I > >> >>>> have seen a RS with 16GB heap that grows to 55GB virtual size. At > >> some > >> >>>> point, everything start swapping and GC times go into the minutes > and > >> >>>> everything dies or is considered dead by the master. > >> >>>> > >> >>>> I did a pmap -x on the RS process and that shows a lot of allocated > >> >> blocks > >> >>>> of about 64M by the process. There about 500 of these, which is > 32GB > >> in > >> >>>> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the > >> >> blocks > >> >>>> of about 1M on top are probably thread stacks). Unfortunately, > Linux > >> >> shows > >> >>>> the native heap as anon blocks, so I can not link it to a specific > >> lib > >> >> or > >> >>>> something. > >> >>>> > >> >>>> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, > the > >> >> one > >> >>>> which has the reinit() support). I run Java 6u21 with the G1 > garbage > >> >>>> collector, which has been running fine for some weeks now. Full > >> command > >> >> line > >> >>>> is: > >> >>>> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError > >> >>>> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC > -XX:+UseCompressedOops > >> >>>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps > >> >>>> -Xloggc:/export/logs/hbase/gc-hbase.log > >> >>>> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64 > >> >>>> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase > >> >>>> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log > >> >>>> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase > -Dhbase.r > >> >>>> > >> >>>> I searched the HBase source for something that could point to > native > >> >> heap > >> >>>> usage (like ByteBuffer#allocateDirect(...)), but I could not find > >> >> anything. > >> >>>> Thread count is about 185 (I have 100 handlers), so nothing strange > >> >> there as > >> >>>> well. > >> >>>> > >> >>>> Question is, could this be HBase or is this a problem with the > >> >> hadoop-lzo? > >> >>>> > >> >>>> I have currently downgraded to a version known to work, because we > >> have > >> >> a > >> >>>> demo coming up. But still interested in the answer. > >> >>>> > >> >>>> > >> >>>> > >> >>>> Regards, > >> >>>> Friso > >> >>>> > >> >>>> > >> >>> > >> >>> > >> >>> -- > >> >>> Todd Lipcon > >> >>> Software Engineer, Cloudera > >> >> > >> >> > >> > > >> > > >> > -- > >> > Todd Lipcon > >> > Software Engineer, Cloudera > >> > >> > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- --Sean
