Re: ideas to improve throughput of the base writting

Ted Yu Thu, 10 Jun 2010 09:11:35 -0700

https://issues.apache.org/jira/browse/HBASE-2706 has been logged.
Feel free to add comments there.


On Thu, Jun 10, 2010 at 7:33 AM, Stack <[email protected]> wrote:

> On Wed, Jun 9, 2010 at 10:27 PM, Ted Yu <[email protected]> wrote:
> > Is it possible for one or more of the parameters dynamic ?
> > Meaning embedding tuning heuristic in HBase code.
> >
>
>
> This should be the goal for sure.    Ideally, hbase would adjust as
> the loading character changed.
> St.Ack
>
>
>
>
>
> > On Wed, Jun 9, 2010 at 6:08 PM, Ryan Rawson <[email protected]> wrote:
> >
> >> One issue you may run into is that 0.20 doesn't have
> >> https://issues.apache.org/jira/browse/HBASE-2066
> >>
> >> a dev preview of 0.21 which does include that, and does improve
> >> performance should be available soon.
> >>
> >> On Wed, Jun 9, 2010 at 5:55 PM, Jinsong Hu <[email protected]>
> wrote:
> >> > Yes, I have done all the suggestion of the
> >> > http://wiki.apache.org/hadoop/PerformanceTuning.
> >> >
> >> > I just restarted the hbase cluster and recreated the table,  the data
> >> > insertion looks fine for now and
> >> > I am getting about 1k record/second . I consider that to be reasonable
> >> > giving that my record is about
> >> > 10k bytes per record. but this is the beginning of the writing and I
> >> notice
> >> > that when the table is small,
> >> > the hbase works fine. when there are lots of records in the table
> >> already,
> >> > problem begin to happen.
> >> > I will report back and see how it goes after some more time.
> >> >
> >> > Jimmy.
> >> >
> >> >
> >> > --------------------------------------------------
> >> > From: "Ryan Rawson" <[email protected]>
> >> > Sent: Wednesday, June 09, 2010 5:20 PM
> >> > To: <[email protected]>
> >> > Subject: Re: ideas to improve throughput of the base writting
> >> >
> >> >> I am not familiar with that exception, I have not seen of it
> before...
> >> >> perhaps someone else has?
> >> >>
> >> >> And my 200k rows/sec is over 19 machines.  It is the average over
> many
> >> >> hours.  My calculation of row size might not match how much data was
> >> >> flowing to disk, but I think it isn't too far off.
> >> >>
> >> >> Unfortunately comparing raw disk speed in a trivial benchmark (such
> as
> >> >> hdparm -t is) doesn't tell us how absolute speed of HBase must
> >> >> perform.  This is because HBase does much more work than a raw disk
> >> >> write benchmark -- doing so to maintain structure and sorting.  We
> can
> >> >> say that 'faster disks = faster HBase performance'.
> >> >>
> >> >> From the log lines you have pasted it sounds like the regionserver's
> >> >> flush ability is not keeping up with your rate of data input.  How
> big
> >> >> are your records?  What is your target input speed?  Have you done
> >> >> anything on this page:
> >> >> http://wiki.apache.org/hadoop/PerformanceTuning
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Jun 9, 2010 at 4:58 PM, Jinsong Hu <[email protected]>
> >> wrote:
> >> >>>
> >> >>> My hardware has 2 disks. I did a file copy on the machine and found
> >> that
> >> >>> I
> >> >>> can get 300 mbyte/second.
> >> >>>
> >> >>> At this time, I see my insertion is less than 1k/second. my row size
> is
> >> .
> >> >>> in
> >> >>> terms of disk writing. my record
> >> >>> insertion rate is far less than the hardware limit.  my row size is
> >> about
> >> >>> 10K byte
> >> >>>
> >> >>> if in your i7-based server, you are doing 200k row/sec, each row is
> 200
> >> >>> byte, then you are doing 40M byte/sec.
> >> >>>
> >> >>> in my case, if it behaves normally, I can get 100 row/sec * 10K byte
> >> =1M
> >> >>> /sec.
> >> >>> that is far from the disk speed. occasionally I can see 1k
> row/second.
> >> >>> which
> >> >>> is more reasonable in my case,
> >> >>> but I rarely get that performance.
> >> >>>
> >> >>> even worse, with the change done, now I have seem lots of compaction
> >> >>> failure:
> >> >>>
> >> >>> 2010-06-09 23:40:55,117 ERROR
> >> >>> org.apache.hadoop.hbase.regionserver.CompactSplitT
> >> >>> hread: Compaction failed for region Spam_MsgEventTable,2010-06-09
> >> >>> 20:05:20\x0905
> >> >>> 860d4bf1cb268ef69391cf97de9f64,1276121160527
> >> >>> java.lang.RuntimeException: java.io.IOException: Could not find
> target
> >> >>> position
> >> >>> 65588
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileS
> >> >>> canner.java:61)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.j
> >> >>> ava:79)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.regionserver.MinorCompactingStoreScanner.next
> >> >>> (MinorCompactingStoreScanner.java:96)
> >> >>>      at
> >> >>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:920)
> >> >>>      at
> >> >>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:764)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
> >> >>> va:832)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
> >> >>> va:785)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
> >> >>> litThread.java:93)
> >> >>> Caused by: java.io.IOException: Could not find target position 65588
> >> >>>      at
> >> >>>
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockAt(DFSClien
> >> >>> t.java:1556)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient
> >> >>> .java:1666)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1
> >> >>> 780)
> >> >>>      at java.io.DataInputStream.read(DataInputStream.java:132)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(Bou
> >> >>> ndedRangeFileInputStream.java:105)
> >> >>>      at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1
> >> >>> 018)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:96
> >> >>> 6)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java
> >> >>> :1159)
> >> >>>      at
> >> >>>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileS
> >> >>> canner.java:58)
> >> >>>      ... 7 more
> >> >>>
> >> >>> I can't stop this unless I restarted the regionserver. After restart
> I
> >> >>> truncate the table, and when I list the table again in shell,
> >> >>> it appears 2 times. now I can't even disable the table and drop it.
> >> >>>
> >> >>> I will restart the whole hbase cluster and report the progress.
> >> >>>
> >> >>> Jimmy/
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --------------------------------------------------
> >> >>> From: "Ryan Rawson" <[email protected]>
> >> >>> Sent: Wednesday, June 09, 2010 4:16 PM
> >> >>> To: <[email protected]>
> >> >>> Subject: Re: ideas to improve throughput of the base writting
> >> >>>
> >> >>>> Hey,
> >> >>>>
> >> >>>> Sounds like you are hitting limits of your hardware... I dont think
> >> >>>> you mentioned the hardware spec you are running in this thread...
> >> >>>>
> >> >>>> What you are seeing is essentially the limits of HDFS's ability to
> >> >>>> take writes.  The errors might be due to various HDFS setup
> problems
> >> >>>> (eg: xceiver count, file handle count, all outlined in various
> HBase
> >> >>>> "startup" docs)... But the overall performance might be limited by
> >> >>>> your hardware.
> >> >>>>
> >> >>>> For example, I use i7-based servers with 4 disks.  This gives a
> >> >>>> reasonable IO bandwidth, and can cope with high rates of inserts
> (upto
> >> >>>> 100-200k rows/sec (each row is ~ 100-300 bytes).  If you are
> running a
> >> >>>> 1 or 2 disk system it is possible you are hitting limits of what
> your
> >> >>>> hardware can do.
> >> >>>>
> >> >>>> Also note that the write-pipeline performance is ultimately defined
> in
> >> >>>> bytes/sec, not just 'rows/sec'... thus my rows were small, and if
> >> >>>> yours are big then you might be hitting a lower 'row/sec' limit
> even
> >> >>>> though the amount of bytes you are writing is higher than what i
> might
> >> >>>> have been doing.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On Wed, Jun 9, 2010 at 3:59 PM, Jinsong Hu <[email protected]
> >
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> I still get lots of repetition of
> >> >>>>>
> >> >>>>> 2010-06-09 22:54:38,428 WARN
> >> >>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlushe
> >> >>>>> r: Region Spam_MsgEventTable,2010-06-09
> >> >>>>> 20:05:20\x0905860d4bf1cb268ef69391cf97de
> >> >>>>> 9f64,1276121160527 has too many store files, putting it back at
> the
> >> end
> >> >>>>> of
> >> >>>>> the f
> >> >>>>> lush queue.
> >> >>>>> 2010-06-09 22:54:38,428 DEBUG
> >> >>>>> org.apache.hadoop.hbase.regionserver.CompactSplitT
> >> >>>>> hread: Compaction requested for region
> Spam_MsgEventTable,2010-06-09
> >> >>>>> 20:05:20\x0
> >> >>>>> 905860d4bf1cb268ef69391cf97de9f64,1276121160527/1537478401
> because:
> >> >>>>> regionserver
> >> >>>>> /10.110.8.88:60020.cacheFlusher
> >> >>>>>
> >> >>>>>
> >> >>>>> I also saw lots of
> >> >>>>>
> >> >>>>> 2010-06-09 22:50:12,527 INFO
> >> >>>>> org.apache.hadoop.hbase.regionserver.HRegion:
> >> >>>>> Block
> >> >>>>> ing updates for 'IPC Server handler 1 on 60020' on region
> >> >>>>> Spam_MsgEventTable,201
> >> >>>>> 0-06-09
> 20:05:20\x0905860d4bf1cb268ef69391cf97de9f64,1276121160527:
> >> >>>>> memstore
> >> >>>>> siz
> >> >>>>> e 512.0m is >= than blocking 512.0m size
> >> >>>>> 2010-06-09 22:50:12,598 INFO
> >> >>>>> org.apache.hadoop.hbase.regionserver.HRegion:
> >> >>>>> Block
> >> >>>>> ing updates for 'IPC Server handler 5 on 60020' on region
> >> >>>>> Spam_MsgEventTable,201
> >> >>>>> 0-06-09
> 20:05:20\x0905860d4bf1cb268ef69391cf97de9f64,1276121160527:
> >> >>>>> memstore
> >> >>>>> siz
> >> >>>>> e 512.0m is >= than blocking 512.0m size
> >> >>>>>
> >> >>>>> even with the changed config. the regionserver has 4G ram.  what
> else
> >> >>>>> can
> >> >>>>> be
> >> >>>>> wrong ?
> >> >>>>>
> >> >>>>> The insertion rate is still not good.
> >> >>>>>
> >> >>>>> Jimmy.
> >> >>>>>
> >> >>>>>
> >> >>>>> --------------------------------------------------
> >> >>>>> From: "Jinsong Hu" <[email protected]>
> >> >>>>> Sent: Wednesday, June 09, 2010 1:59 PM
> >> >>>>> To: <[email protected]>
> >> >>>>> Subject: Re: ideas to improve throughput of the base writting
> >> >>>>>
> >> >>>>>> Thanks. I will make this change:
> >> >>>>>>
> >> >>>>>> <property>
> >> >>>>>>  <name>hbase.hregion.memstore.block.multiplier</name>
> >> >>>>>>  <value>8</value>
> >> >>>>>> </property>
> >> >>>>>>
> >> >>>>>> <property>
> >> >>>>>>  <name>hbase.regionserver.msginterval</name>
> >> >>>>>>  <value>10000</value>
> >> >>>>>> </property>
> >> >>>>>>
> >> >>>>>>  <property>
> >> >>>>>>  <name>hbase.hstore.compactionThreshold</name>
> >> >>>>>>  <value>6</value>
> >> >>>>>> </property>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> <property>
> >> >>>>>>  <name>hbase.hstore.blockingStoreFiles</name>
> >> >>>>>>  <value>18</value>
> >> >>>>>> </property>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> and see how it goes.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Jimmy.
> >> >>>>>>
> >> >>>>>> --------------------------------------------------
> >> >>>>>> From: "Ryan Rawson" <[email protected]>
> >> >>>>>> Sent: Wednesday, June 09, 2010 1:49 PM
> >> >>>>>> To: <[email protected]>
> >> >>>>>> Subject: Re: ideas to improve throughput of the base writting
> >> >>>>>>
> >> >>>>>>> More background here... you are running into a situation where
> the
> >> >>>>>>> regionserver cannot flush fast enough and the size of the
> region's
> >> >>>>>>> memstore has climbed too high and thus you get that error
> message.
> >> >>>>>>> HBase attempts to protect itself by holding up clients (thus
> >> causing
> >> >>>>>>> the low performance you see).  By expanding how big a memstore
> can
> >> >>>>>>> get
> >> >>>>>>> during times of stress you can improve performance, at the cost
> of
> >> >>>>>>> memory usage. That is what that setting is about.
> >> >>>>>>>
> >> >>>>>>> As for the 1.5 minute setting, that is the maximal amount of
> time a
> >> >>>>>>> handler thread will block for.  You shouldn't need to tweak that
> >> >>>>>>> value, and reducing it could cause issues.
> >> >>>>>>>
> >> >>>>>>> Now, as for compacting, HBase will compact small files into
> larger
> >> >>>>>>> files, and on a massive upload you can expect to see this happen
> >> >>>>>>> constantly, thus tying up 1 cpu worth on your regionserver.  You
> >> >>>>>>> could
> >> >>>>>>> potentially reduce that by increasing the value:
> >> >>>>>>>
> >> >>>>>>>  <property>
> >> >>>>>>>  <name>hbase.hstore.compactionThreshold</name>
> >> >>>>>>>  <value>3</value>
> >> >>>>>>>
> >> >>>>>>> the value is interpreted as "if there are more than 3 files for
> a
> >> >>>>>>> region then run the compaction check".  By raising this limit
> you
> >> can
> >> >>>>>>> accumulate more files before compacting them, thus reducing the
> >> >>>>>>> frequency of compactions but also potentially increasing the
> >> >>>>>>> performance of reads (more files to read = more seeks = slower).
> >>  I'd
> >> >>>>>>> consider setting it to 5-7 or so in concert with setting
> >> >>>>>>> "hbase.hstore.blockingStoreFiles" to a value at least 2x that.
> >> >>>>>>>
> >> >>>>>>> All of these settings increase the amount of ram your
> regionserver
> >> >>>>>>> may
> >> >>>>>>> need, so you will want to ensure you have at least 4000m of ram
> set
> >> >>>>>>> in
> >> >>>>>>> hbase-env.sh.  This is why they are set so conservatively in the
> >> >>>>>>> default shipping config.
> >> >>>>>>>
> >> >>>>>>> These are the 3 important settings that control how often
> >> compactions
> >> >>>>>>> occur and how RPC threads get blocked.  Try tweaking all of them
> >> and
> >> >>>>>>> let me know if you are doing better.
> >> >>>>>>>
> >> >>>>>>> -ryan
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Wed, Jun 9, 2010 at 1:37 PM, Ryan Rawson <[email protected]
> >
> >> >>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>> you also want this config:
> >> >>>>>>>>
> >> >>>>>>>> <property>
> >> >>>>>>>>  <name>hbase.hregion.memstore.block.multiplier</name>
> >> >>>>>>>>  <value>8</value>
> >> >>>>>>>> </property>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> that should hopefully clear things up.
> >> >>>>>>>>
> >> >>>>>>>> -ryan
> >> >>>>>>>>
> >> >>>>>>>> On Wed, Jun 9, 2010 at 1:34 PM, Jinsong Hu <
> >> [email protected]>
> >> >>>>>>>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>> I checked the log, there are lots of
> >> >>>>>>>>>
> >> >>>>>>>>> e 128.1m is >= than blocking 128.0m size
> >> >>>>>>>>> 2010-06-09 17:26:36,736 INFO
> >> >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
> >> >>>>>>>>> Block
> >> >>>>>>>>> ing updates for 'IPC Server handler 8 on 60020' on region
> >> >>>>>>>>> Spam_MsgEventTable,201
> >> >>>>>>>>> 0-06-09
> >> 05:25:32\x09c873847edf6e5390477494956ec04729,1276104002262:
> >> >>>>>>>>> memstore
> >> >>>>>>>>> siz
> >> >>>>>>>>> e 128.1m is >= than blocking 128.0m size
> >> >>>>>>>>>
> >> >>>>>>>>> then after that there are lots of
> >> >>>>>>>>>
> >> >>>>>>>>> 2010-06-09 17:26:36,800 DEBUG
> >> >>>>>>>>> org.apache.hadoop.hbase.regionserver.Store:
> >> >>>>>>>>> Added
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> hdfs://
> >>
> namenodes1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/376337880/messag
> >> >>>>>>>>> e_compound_terms/7606939244559826252, entries=30869,
> >> >>>>>>>>> sequenceid=8350447892,
> >> >>>>>>>>> mems
> >> >>>>>>>>> ize=7.2m, filesize=3.4m to Spam_MsgEventTable,2010-06-09
> >> >>>>>>>>> 05:25:32\x09c873847edf6
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> then lots of
> >> >>>>>>>>>
> >> >>>>>>>>> 2010-06-09 17:26:39,005 INFO
> >> >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
> >> >>>>>>>>> Unblo
> >> >>>>>>>>> cking updates for region Spam_MsgEventTable,2010-06-09
> >> >>>>>>>>> 05:25:32\x09c873847edf6e5
> >> >>>>>>>>> 390477494956ec04729,1276104002262 'IPC Server handler 8 on
> 60020'
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> This cycle happens again and again in the log.   What can I do
> in
> >> >>>>>>>>> this
> >> >>>>>>>>> case
> >> >>>>>>>>> to speed up writing ?
> >> >>>>>>>>> right now the writing speed is really slow. close to 4
> >> rows/second
> >> >>>>>>>>> for
> >> >>>>>>>>> a
> >> >>>>>>>>> regionserver.
> >> >>>>>>>>>
> >> >>>>>>>>> I checked the code and try to find out why there are so many
> >> store
> >> >>>>>>>>> files,
> >> >>>>>>>>> and I noticed each second
> >> >>>>>>>>> the regionserver reports to master, it calls the memstore
> flush
> >> and
> >> >>>>>>>>> write a
> >> >>>>>>>>> store file.
> >> >>>>>>>>>
> >> >>>>>>>>> the parameter hbase.regionserver.msginterval default value is
> 1
> >> >>>>>>>>> second.
> >> >>>>>>>>> I am
> >> >>>>>>>>> thinking to change to 10 second.
> >> >>>>>>>>> can that help ? I am also thinking to change
> >> >>>>>>>>> hbase.hstore.blockingStoreFiles
> >> >>>>>>>>> to 1000.  I noticed that there is a parameter
> >> >>>>>>>>> hbase.hstore.blockingWaitTime with default value of 1.5
> minutes.
> >> as
> >> >>>>>>>>> long as
> >> >>>>>>>>> the 1.5 minutes is reached,
> >> >>>>>>>>> the compaction is executed. I am fine with running compaction
> >> every
> >> >>>>>>>>> 1.5
> >> >>>>>>>>> minutes, but running compaction every second
> >> >>>>>>>>> and causing CPU consistently higher than 100% is not wanted.
> >> >>>>>>>>>
> >> >>>>>>>>> Any suggestion what kind of parameters to change to improve my
> >> >>>>>>>>> writing
> >> >>>>>>>>> speed
> >> >>>>>>>>> ?
> >> >>>>>>>>>
> >> >>>>>>>>> Jimmy
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> --------------------------------------------------
> >> >>>>>>>>> From: "Ryan Rawson" <[email protected]>
> >> >>>>>>>>> Sent: Wednesday, June 09, 2010 1:01 PM
> >> >>>>>>>>> To: <[email protected]>
> >> >>>>>>>>> Subject: Re: ideas to improve throughput of the base writting
> >> >>>>>>>>>
> >> >>>>>>>>>> The log will say something like "blocking updates to..." when
> >> you
> >> >>>>>>>>>> hit
> >> >>>>>>>>>> a limit.  That log you indicate is just the regionserver
> >> >>>>>>>>>> attempting
> >> >>>>>>>>>> to
> >> >>>>>>>>>> compact a region, but shouldn't prevent updates.
> >> >>>>>>>>>>
> >> >>>>>>>>>> what else does your logfile say?  Search for the string (case
> >> >>>>>>>>>> insensitive) "blocking updates"...
> >> >>>>>>>>>>
> >> >>>>>>>>>> -ryan
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Wed, Jun 9, 2010 at 11:52 AM, Jinsong Hu
> >> >>>>>>>>>> <[email protected]>
> >> >>>>>>>>>> wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> I made this change
> >> >>>>>>>>>>> <property>
> >> >>>>>>>>>>>  <name>hbase.hstore.blockingStoreFiles</name>
> >> >>>>>>>>>>>  <value>15</value>
> >> >>>>>>>>>>> </property>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> the system is still slow.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Here is the most recent value for the region :
> >> >>>>>>>>>>> stores=21, storefiles=186, storefileSizeMB=9681,
> >> >>>>>>>>>>> memstoreSizeMB=128,
> >> >>>>>>>>>>> storefileIndexSizeMB=12
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> And the same log still happens:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> 2010-06-09 18:36:40,577 WARN org.apache.h
> >> >>>>>>>>>>> adoop.hbase.regionserver.MemStoreFlusher: Region
> >> >>>>>>>>>>> SOME_ABCEventTable,2010-06-09 0
> >> >>>>>>>>>>> 9:56:56\x093dc01b4d2c4872963717d80d8b5c74b1,1276107447570
> has
> >> too
> >> >>>>>>>>>>> many
> >> >>>>>>>>>>> store
> >> >>>>>>>>>>> fil
> >> >>>>>>>>>>> es, putting it back at the end of the flush queue.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> One idea that I have now is to further increase the
> >> >>>>>>>>>>> hbase.hstore.blockingStoreFiles to a very high
> >> >>>>>>>>>>> Number, such as 1000.  What is the negative impact of this
> >> change
> >> >>>>>>>>>>> ?
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Jimmy
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> --------------------------------------------------
> >> >>>>>>>>>>> From: "Ryan Rawson" <[email protected]>
> >> >>>>>>>>>>> Sent: Monday, June 07, 2010 3:58 PM
> >> >>>>>>>>>>> To: <[email protected]>
> >> >>>>>>>>>>> Subject: Re: ideas to improve throughput of the base
> writting
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>> Try setting this config value:
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> <property>
> >> >>>>>>>>>>>>  <name>hbase.hstore.blockingStoreFiles</name>
> >> >>>>>>>>>>>>  <value>15</value>
> >> >>>>>>>>>>>> </property>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> and see if that helps.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> The thing about the 1 compact thread is the scarce
> resources
> >> >>>>>>>>>>>> being
> >> >>>>>>>>>>>> preserved in this case is cluster IO.  People have had
> issues
> >> >>>>>>>>>>>> with
> >> >>>>>>>>>>>> compaction IO being too heavy.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> in your case, this setting can let the regionserver build
> up
> >> >>>>>>>>>>>> more
> >> >>>>>>>>>>>> store files without pausing your import.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> -ryan
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> On Mon, Jun 7, 2010 at 3:52 PM, Jinsong Hu
> >> >>>>>>>>>>>> <[email protected]>
> >> >>>>>>>>>>>> wrote:
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Hi,  There:
> >> >>>>>>>>>>>>>  While saving lots of data to  on hbase, I noticed that
> the
> >> >>>>>>>>>>>>> regionserver
> >> >>>>>>>>>>>>> CPU
> >> >>>>>>>>>>>>> went to more than 100%. examination shows that the hbase
> >> >>>>>>>>>>>>> CompactSplit
> >> >>>>>>>>>>>>> is
> >> >>>>>>>>>>>>> spending full time working on compacting/splitting  hbase
> >> store
> >> >>>>>>>>>>>>> files.
> >> >>>>>>>>>>>>> The
> >> >>>>>>>>>>>>> machine I have is an 8 core machine. because there is only
> >> one
> >> >>>>>>>>>>>>> comact/split
> >> >>>>>>>>>>>>> thread in hbase, only one core is fully used.
> >> >>>>>>>>>>>>>  I continue to submit  map/reduce job to insert records to
> >> >>>>>>>>>>>>> hbase.
> >> >>>>>>>>>>>>> most
> >> >>>>>>>>>>>>> of
> >> >>>>>>>>>>>>> the time, the job runs very fast, around 1-5 minutes. But
> >> >>>>>>>>>>>>> occasionally,
> >> >>>>>>>>>>>>> it
> >> >>>>>>>>>>>>> can take 2 hours. That is very bad to me. I highly suspect
> >> that
> >> >>>>>>>>>>>>> the
> >> >>>>>>>>>>>>> occasional slow insertion is related to the
> >> >>>>>>>>>>>>> insufficient speed  compactsplit thread.
> >> >>>>>>>>>>>>>  I am thinking that I should parallize the compactsplit
> >> thread,
> >> >>>>>>>>>>>>> the
> >> >>>>>>>>>>>>> code
> >> >>>>>>>>>>>>> has
> >> >>>>>>>>>>>>> this  : the for loop "for (Store store: stores.values())
>  "
> >> can
> >> >>>>>>>>>>>>> be
> >> >>>>>>>>>>>>> parallized via java 5's threadpool , thus multiple cores
> are
> >> >>>>>>>>>>>>> used
> >> >>>>>>>>>>>>> instead
> >> >>>>>>>>>>>>> only one core is used. I wonder if this will help to
> increase
> >> >>>>>>>>>>>>> the
> >> >>>>>>>>>>>>> throughput.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  Somebody mentioned that I can increase the regionsize to
> >> that
> >> >>>>>>>>>>>>> I
> >> >>>>>>>>>>>>> don't
> >> >>>>>>>>>>>>> do
> >> >>>>>>>>>>>>> so
> >> >>>>>>>>>>>>> many compaction. Under heavy writing situation.
> >> >>>>>>>>>>>>> does anybody have experience showing it helps ?
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Jimmy.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  byte [] compactStores(final boolean majorCompaction)
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  throws IOException {
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  if (this.closing.get() || this.closed.get()) {
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  LOG.debug("Skipping compaction on " + this + " because
> >> >>>>>>>>>>>>> closing/closed");
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  return null;
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  }
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  splitsAndClosesLock.readLock().lock();
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  try {
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  byte [] splitRow = null;
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  if (this.closed.get()) {
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  return splitRow;
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  }
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>  try {
> >> >>>>>>>>>>>>>
> >> >>
> >> >
> >>
> >
>

Re: ideas to improve throughput of the base writting

Reply via email to