On Tue, Aug 9, 2011 at 1:44 AM, Stack <[email protected]> wrote:

> On Sun, Aug 7, 2011 at 12:28 PM, Oleg Ruchovets <[email protected]>
> wrote:
> > *here is the one of the region server's log : *
> > http://pastebin.com/raw.php?i=VF2bSMYd
> >
>
> I see this Oleg: "Caused by: java.lang.OutOfMemoryError: Java heap space"
>
>

Yes , I saw this too ,
but I think this is not a root of the problem , it is a result of server
being busy compacting the files and not able to handle insertion to hbase at
the same time. Does it make sense?
If not what is the way to get more details about this issue? I think about
profiling , but we have 10 machine and I don't know which region server
could get OutOfMemoryError.  What is the best practice to profile the greed
of hbase?



> > Additional information and questions:
> >  1) We disable automatic major  compaction and run major compaction
> manually
> > ,but from log file I got such log entry:
> >
> >      2011-08-07 20:57:20,706 INFO
> > org.apache.hadoop.hbase.regionserver.Store: Completed major compaction of
> 5
> > file(s), new file=hdfs://hadoop-
> >
>  
> master.infolinks.local:8000/hbase/URLS/70c4ed1855cee6201e583662272f7a46/searches/6451756610532158137,
> > size=6.7m; total size for store is 6.7m
> >
>
> A minor compaction can be promoted to major if it ends up picking all
> files compacting (see earlier in log it'll start of as an 'ordinary'
> compaction and then later become a 'major').
>

Yeah  exactly. I just think is it possible to disable minor compaction as we
did with major?
I found such configuration parameters:


http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836>
917

*/***917 
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#917>
*   * Algorithm to choose which files to compact*918
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#918>
*   **919 
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#919>
*   * Configuration knobs:*920
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#920>
*   *  "hbase.hstore.compaction.ratio"*921
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#921>
*   *    normal case: minor compact when file <= sum(smaller_files) *
ratio*922 
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#922>
*   *  "hbase.hstore.compaction.min.size"*923
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#923>
*   *    unconditionally compact individual files below this size*924
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#924>
*   *  "hbase.hstore.compaction.max.size"*925
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#925>
*   *    never compact individual files above this size (unless
splitting)*926 
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#926>
*   *  "hbase.hstore.compaction.min"*927
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#927>
*   *    min files needed to minor compact*928
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#928>
*   *  "hbase.hstore.compaction.max"*929
<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#929>
*   *    max files to compact at once (avoids OOM) *

930<http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#930>
* **

              And what is the penalty or potential problem we can get
disabling minor compaction(if it is possible of course).Our use case is that
we inserting data on daily basis and creates predefined regions to avoid
automatic splits.

What is the additional tuning technique could be suitable for such hbase
behavior?



> >     We started major compaction at 00:00 every day but this log entry
> time
> >  is 20:57:20 , so how can I check that major compaction has been
> finished?
>
> The compaction is async.  Currently no flag set on completion.  This
> is an issue we need to figure an answer for.
>
> > And what could be a reason for starting
> >  2) There are a lot of exceptions like this:
> >      2011-08-07 01:22:34,821 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> > Server handler 16 on 8041 caught:
> java.nio.channels.ClosedChannelException
> > at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
> > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
> > at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
> >
> >      what is this exception mean and is this a normal behaviour.
>
>
> The client has given up listening.  Do you see a corresponding timeout
> around same time on client-side?
>
>
>
Where can I check it? our client is reducers of map/reduce job. All that I
see from reducers logs is that it successfully connected to hbase. zookeper
logs are pretty clean.


>
> > 3)   There are logs entry like this :
> >       2011-08-07 17:14:05,833 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> > timed out:  URLS,
> >
> 20110802_budnmarys.squarespace.com/picture-gallery/miscellaneous-gallery/1138727,1312377360131.e302bc31e326308031a82e9eca6e0b6a
> .
> > state=OFFLINE, ts=1312726415824
> > 2011-08-07 17:14:05,833 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
> > for too long, reassigning URLS,
> >
> 20110802_budnmarys.squarespace.com/picture-gallery/miscellaneous-gallery/1138727,1312377360131.e302bc31e326308031a82e9eca6e0b6a
> .
> > to a random server
> > 2011-08-07 17:14:05,833 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> > timed out:
>  URLS,20110509_e,1305018012046.e48c6df0a31c41f482bcaccf71244ccb.
> > state=OFFLINE, ts=1312726415824
> > 2011-08-07 17:14:05,833 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
> > for too long, reassigning
> > URLS,20110509_e,1305018012046.e48c6df0a31c41f482bcaccf71244ccb. to a
> random
> > server
> > 2011-08-07 17:14:05,833 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> > timed out:
>  URLS,20110731_gg,1312187408164.e7fa3b00af458db5af93d5c475712f62.
> > state=OFFLINE, ts=1312726415824
> >
> >   What does *Regions in transition timed out *means and is it correct
> > behaviour?
>
>
>
Does it go on without ever resolving?  If so, this is not usually a
> good sign.  Some recent issues have addressed this with fixes in
> 0.90.4 which should be out soon (Check its release notes for related
> issues).
>

St.Ack
>

Reply via email to