Re: ideas to improve throughput of the base writting

Stack Thu, 10 Jun 2010 07:35:04 -0700

On Wed, Jun 9, 2010 at 10:27 PM, Ted Yu <[email protected]> wrote:
> Is it possible for one or more of the parameters dynamic ?
> Meaning embedding tuning heuristic in HBase code.
>



This should be the goal for sure.    Ideally, hbase would adjust as
the loading character changed.
St.Ack





> On Wed, Jun 9, 2010 at 6:08 PM, Ryan Rawson <[email protected]> wrote:
>
>> One issue you may run into is that 0.20 doesn't have
>> https://issues.apache.org/jira/browse/HBASE-2066
>>
>> a dev preview of 0.21 which does include that, and does improve
>> performance should be available soon.
>>
>> On Wed, Jun 9, 2010 at 5:55 PM, Jinsong Hu <[email protected]> wrote:
>> > Yes, I have done all the suggestion of the
>> > http://wiki.apache.org/hadoop/PerformanceTuning.
>> >
>> > I just restarted the hbase cluster and recreated the table,  the data
>> > insertion looks fine for now and
>> > I am getting about 1k record/second . I consider that to be reasonable
>> > giving that my record is about
>> > 10k bytes per record. but this is the beginning of the writing and I
>> notice
>> > that when the table is small,
>> > the hbase works fine. when there are lots of records in the table
>> already,
>> > problem begin to happen.
>> > I will report back and see how it goes after some more time.
>> >
>> > Jimmy.
>> >
>> >
>> > --------------------------------------------------
>> > From: "Ryan Rawson" <[email protected]>
>> > Sent: Wednesday, June 09, 2010 5:20 PM
>> > To: <[email protected]>
>> > Subject: Re: ideas to improve throughput of the base writting
>> >
>> >> I am not familiar with that exception, I have not seen of it before...
>> >> perhaps someone else has?
>> >>
>> >> And my 200k rows/sec is over 19 machines.  It is the average over many
>> >> hours.  My calculation of row size might not match how much data was
>> >> flowing to disk, but I think it isn't too far off.
>> >>
>> >> Unfortunately comparing raw disk speed in a trivial benchmark (such as
>> >> hdparm -t is) doesn't tell us how absolute speed of HBase must
>> >> perform.  This is because HBase does much more work than a raw disk
>> >> write benchmark -- doing so to maintain structure and sorting.  We can
>> >> say that 'faster disks = faster HBase performance'.
>> >>
>> >> From the log lines you have pasted it sounds like the regionserver's
>> >> flush ability is not keeping up with your rate of data input.  How big
>> >> are your records?  What is your target input speed?  Have you done
>> >> anything on this page:
>> >> http://wiki.apache.org/hadoop/PerformanceTuning
>> >>
>> >>
>> >>
>> >> On Wed, Jun 9, 2010 at 4:58 PM, Jinsong Hu <[email protected]>
>> wrote:
>> >>>
>> >>> My hardware has 2 disks. I did a file copy on the machine and found
>> that
>> >>> I
>> >>> can get 300 mbyte/second.
>> >>>
>> >>> At this time, I see my insertion is less than 1k/second. my row size is
>> .
>> >>> in
>> >>> terms of disk writing. my record
>> >>> insertion rate is far less than the hardware limit.  my row size is
>> about
>> >>> 10K byte
>> >>>
>> >>> if in your i7-based server, you are doing 200k row/sec, each row is 200
>> >>> byte, then you are doing 40M byte/sec.
>> >>>
>> >>> in my case, if it behaves normally, I can get 100 row/sec * 10K byte
>> =1M
>> >>> /sec.
>> >>> that is far from the disk speed. occasionally I can see 1k row/second.
>> >>> which
>> >>> is more reasonable in my case,
>> >>> but I rarely get that performance.
>> >>>
>> >>> even worse, with the change done, now I have seem lots of compaction
>> >>> failure:
>> >>>
>> >>> 2010-06-09 23:40:55,117 ERROR
>> >>> org.apache.hadoop.hbase.regionserver.CompactSplitT
>> >>> hread: Compaction failed for region Spam_MsgEventTable,2010-06-09
>> >>> 20:05:20\x0905
>> >>> 860d4bf1cb268ef69391cf97de9f64,1276121160527
>> >>> java.lang.RuntimeException: java.io.IOException: Could not find target
>> >>> position
>> >>> 65588
>> >>>      at
>> >>> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileS
>> >>> canner.java:61)
>> >>>      at
>> >>> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.j
>> >>> ava:79)
>> >>>      at
>> >>> org.apache.hadoop.hbase.regionserver.MinorCompactingStoreScanner.next
>> >>> (MinorCompactingStoreScanner.java:96)
>> >>>      at
>> >>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:920)
>> >>>      at
>> >>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:764)
>> >>>      at
>> >>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>> >>> va:832)
>> >>>      at
>> >>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>> >>> va:785)
>> >>>      at
>> >>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>> >>> litThread.java:93)
>> >>> Caused by: java.io.IOException: Could not find target position 65588
>> >>>      at
>> >>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockAt(DFSClien
>> >>> t.java:1556)
>> >>>      at
>> >>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient
>> >>> .java:1666)
>> >>>      at
>> >>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1
>> >>> 780)
>> >>>      at java.io.DataInputStream.read(DataInputStream.java:132)
>> >>>      at
>> >>> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(Bou
>> >>> ndedRangeFileInputStream.java:105)
>> >>>      at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
>> >>>      at
>> >>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1
>> >>> 018)
>> >>>      at
>> >>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:96
>> >>> 6)
>> >>>      at
>> >>> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java
>> >>> :1159)
>> >>>      at
>> >>> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileS
>> >>> canner.java:58)
>> >>>      ... 7 more
>> >>>
>> >>> I can't stop this unless I restarted the regionserver. After restart I
>> >>> truncate the table, and when I list the table again in shell,
>> >>> it appears 2 times. now I can't even disable the table and drop it.
>> >>>
>> >>> I will restart the whole hbase cluster and report the progress.
>> >>>
>> >>> Jimmy/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --------------------------------------------------
>> >>> From: "Ryan Rawson" <[email protected]>
>> >>> Sent: Wednesday, June 09, 2010 4:16 PM
>> >>> To: <[email protected]>
>> >>> Subject: Re: ideas to improve throughput of the base writting
>> >>>
>> >>>> Hey,
>> >>>>
>> >>>> Sounds like you are hitting limits of your hardware... I dont think
>> >>>> you mentioned the hardware spec you are running in this thread...
>> >>>>
>> >>>> What you are seeing is essentially the limits of HDFS's ability to
>> >>>> take writes.  The errors might be due to various HDFS setup problems
>> >>>> (eg: xceiver count, file handle count, all outlined in various HBase
>> >>>> "startup" docs)... But the overall performance might be limited by
>> >>>> your hardware.
>> >>>>
>> >>>> For example, I use i7-based servers with 4 disks.  This gives a
>> >>>> reasonable IO bandwidth, and can cope with high rates of inserts (upto
>> >>>> 100-200k rows/sec (each row is ~ 100-300 bytes).  If you are running a
>> >>>> 1 or 2 disk system it is possible you are hitting limits of what your
>> >>>> hardware can do.
>> >>>>
>> >>>> Also note that the write-pipeline performance is ultimately defined in
>> >>>> bytes/sec, not just 'rows/sec'... thus my rows were small, and if
>> >>>> yours are big then you might be hitting a lower 'row/sec' limit even
>> >>>> though the amount of bytes you are writing is higher than what i might
>> >>>> have been doing.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, Jun 9, 2010 at 3:59 PM, Jinsong Hu <[email protected]>
>> >>>> wrote:
>> >>>>>
>> >>>>> I still get lots of repetition of
>> >>>>>
>> >>>>> 2010-06-09 22:54:38,428 WARN
>> >>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlushe
>> >>>>> r: Region Spam_MsgEventTable,2010-06-09
>> >>>>> 20:05:20\x0905860d4bf1cb268ef69391cf97de
>> >>>>> 9f64,1276121160527 has too many store files, putting it back at the
>> end
>> >>>>> of
>> >>>>> the f
>> >>>>> lush queue.
>> >>>>> 2010-06-09 22:54:38,428 DEBUG
>> >>>>> org.apache.hadoop.hbase.regionserver.CompactSplitT
>> >>>>> hread: Compaction requested for region Spam_MsgEventTable,2010-06-09
>> >>>>> 20:05:20\x0
>> >>>>> 905860d4bf1cb268ef69391cf97de9f64,1276121160527/1537478401 because:
>> >>>>> regionserver
>> >>>>> /10.110.8.88:60020.cacheFlusher
>> >>>>>
>> >>>>>
>> >>>>> I also saw lots of
>> >>>>>
>> >>>>> 2010-06-09 22:50:12,527 INFO
>> >>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >>>>> Block
>> >>>>> ing updates for 'IPC Server handler 1 on 60020' on region
>> >>>>> Spam_MsgEventTable,201
>> >>>>> 0-06-09 20:05:20\x0905860d4bf1cb268ef69391cf97de9f64,1276121160527:
>> >>>>> memstore
>> >>>>> siz
>> >>>>> e 512.0m is >= than blocking 512.0m size
>> >>>>> 2010-06-09 22:50:12,598 INFO
>> >>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >>>>> Block
>> >>>>> ing updates for 'IPC Server handler 5 on 60020' on region
>> >>>>> Spam_MsgEventTable,201
>> >>>>> 0-06-09 20:05:20\x0905860d4bf1cb268ef69391cf97de9f64,1276121160527:
>> >>>>> memstore
>> >>>>> siz
>> >>>>> e 512.0m is >= than blocking 512.0m size
>> >>>>>
>> >>>>> even with the changed config. the regionserver has 4G ram.  what else
>> >>>>> can
>> >>>>> be
>> >>>>> wrong ?
>> >>>>>
>> >>>>> The insertion rate is still not good.
>> >>>>>
>> >>>>> Jimmy.
>> >>>>>
>> >>>>>
>> >>>>> --------------------------------------------------
>> >>>>> From: "Jinsong Hu" <[email protected]>
>> >>>>> Sent: Wednesday, June 09, 2010 1:59 PM
>> >>>>> To: <[email protected]>
>> >>>>> Subject: Re: ideas to improve throughput of the base writting
>> >>>>>
>> >>>>>> Thanks. I will make this change:
>> >>>>>>
>> >>>>>> <property>
>> >>>>>>  <name>hbase.hregion.memstore.block.multiplier</name>
>> >>>>>>  <value>8</value>
>> >>>>>> </property>
>> >>>>>>
>> >>>>>> <property>
>> >>>>>>  <name>hbase.regionserver.msginterval</name>
>> >>>>>>  <value>10000</value>
>> >>>>>> </property>
>> >>>>>>
>> >>>>>>  <property>
>> >>>>>>  <name>hbase.hstore.compactionThreshold</name>
>> >>>>>>  <value>6</value>
>> >>>>>> </property>
>> >>>>>>
>> >>>>>>
>> >>>>>> <property>
>> >>>>>>  <name>hbase.hstore.blockingStoreFiles</name>
>> >>>>>>  <value>18</value>
>> >>>>>> </property>
>> >>>>>>
>> >>>>>>
>> >>>>>> and see how it goes.
>> >>>>>>
>> >>>>>>
>> >>>>>> Jimmy.
>> >>>>>>
>> >>>>>> --------------------------------------------------
>> >>>>>> From: "Ryan Rawson" <[email protected]>
>> >>>>>> Sent: Wednesday, June 09, 2010 1:49 PM
>> >>>>>> To: <[email protected]>
>> >>>>>> Subject: Re: ideas to improve throughput of the base writting
>> >>>>>>
>> >>>>>>> More background here... you are running into a situation where the
>> >>>>>>> regionserver cannot flush fast enough and the size of the region's
>> >>>>>>> memstore has climbed too high and thus you get that error message.
>> >>>>>>> HBase attempts to protect itself by holding up clients (thus
>> causing
>> >>>>>>> the low performance you see).  By expanding how big a memstore can
>> >>>>>>> get
>> >>>>>>> during times of stress you can improve performance, at the cost of
>> >>>>>>> memory usage. That is what that setting is about.
>> >>>>>>>
>> >>>>>>> As for the 1.5 minute setting, that is the maximal amount of time a
>> >>>>>>> handler thread will block for.  You shouldn't need to tweak that
>> >>>>>>> value, and reducing it could cause issues.
>> >>>>>>>
>> >>>>>>> Now, as for compacting, HBase will compact small files into larger
>> >>>>>>> files, and on a massive upload you can expect to see this happen
>> >>>>>>> constantly, thus tying up 1 cpu worth on your regionserver.  You
>> >>>>>>> could
>> >>>>>>> potentially reduce that by increasing the value:
>> >>>>>>>
>> >>>>>>>  <property>
>> >>>>>>>  <name>hbase.hstore.compactionThreshold</name>
>> >>>>>>>  <value>3</value>
>> >>>>>>>
>> >>>>>>> the value is interpreted as "if there are more than 3 files for a
>> >>>>>>> region then run the compaction check".  By raising this limit you
>> can
>> >>>>>>> accumulate more files before compacting them, thus reducing the
>> >>>>>>> frequency of compactions but also potentially increasing the
>> >>>>>>> performance of reads (more files to read = more seeks = slower).
>>  I'd
>> >>>>>>> consider setting it to 5-7 or so in concert with setting
>> >>>>>>> "hbase.hstore.blockingStoreFiles" to a value at least 2x that.
>> >>>>>>>
>> >>>>>>> All of these settings increase the amount of ram your regionserver
>> >>>>>>> may
>> >>>>>>> need, so you will want to ensure you have at least 4000m of ram set
>> >>>>>>> in
>> >>>>>>> hbase-env.sh.  This is why they are set so conservatively in the
>> >>>>>>> default shipping config.
>> >>>>>>>
>> >>>>>>> These are the 3 important settings that control how often
>> compactions
>> >>>>>>> occur and how RPC threads get blocked.  Try tweaking all of them
>> and
>> >>>>>>> let me know if you are doing better.
>> >>>>>>>
>> >>>>>>> -ryan
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Wed, Jun 9, 2010 at 1:37 PM, Ryan Rawson <[email protected]>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> you also want this config:
>> >>>>>>>>
>> >>>>>>>> <property>
>> >>>>>>>>  <name>hbase.hregion.memstore.block.multiplier</name>
>> >>>>>>>>  <value>8</value>
>> >>>>>>>> </property>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> that should hopefully clear things up.
>> >>>>>>>>
>> >>>>>>>> -ryan
>> >>>>>>>>
>> >>>>>>>> On Wed, Jun 9, 2010 at 1:34 PM, Jinsong Hu <
>> [email protected]>
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> I checked the log, there are lots of
>> >>>>>>>>>
>> >>>>>>>>> e 128.1m is >= than blocking 128.0m size
>> >>>>>>>>> 2010-06-09 17:26:36,736 INFO
>> >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >>>>>>>>> Block
>> >>>>>>>>> ing updates for 'IPC Server handler 8 on 60020' on region
>> >>>>>>>>> Spam_MsgEventTable,201
>> >>>>>>>>> 0-06-09
>> 05:25:32\x09c873847edf6e5390477494956ec04729,1276104002262:
>> >>>>>>>>> memstore
>> >>>>>>>>> siz
>> >>>>>>>>> e 128.1m is >= than blocking 128.0m size
>> >>>>>>>>>
>> >>>>>>>>> then after that there are lots of
>> >>>>>>>>>
>> >>>>>>>>> 2010-06-09 17:26:36,800 DEBUG
>> >>>>>>>>> org.apache.hadoop.hbase.regionserver.Store:
>> >>>>>>>>> Added
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> hdfs://
>> namenodes1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/376337880/messag
>> >>>>>>>>> e_compound_terms/7606939244559826252, entries=30869,
>> >>>>>>>>> sequenceid=8350447892,
>> >>>>>>>>> mems
>> >>>>>>>>> ize=7.2m, filesize=3.4m to Spam_MsgEventTable,2010-06-09
>> >>>>>>>>> 05:25:32\x09c873847edf6
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> then lots of
>> >>>>>>>>>
>> >>>>>>>>> 2010-06-09 17:26:39,005 INFO
>> >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>> >>>>>>>>> Unblo
>> >>>>>>>>> cking updates for region Spam_MsgEventTable,2010-06-09
>> >>>>>>>>> 05:25:32\x09c873847edf6e5
>> >>>>>>>>> 390477494956ec04729,1276104002262 'IPC Server handler 8 on 60020'
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> This cycle happens again and again in the log.   What can I do in
>> >>>>>>>>> this
>> >>>>>>>>> case
>> >>>>>>>>> to speed up writing ?
>> >>>>>>>>> right now the writing speed is really slow. close to 4
>> rows/second
>> >>>>>>>>> for
>> >>>>>>>>> a
>> >>>>>>>>> regionserver.
>> >>>>>>>>>
>> >>>>>>>>> I checked the code and try to find out why there are so many
>> store
>> >>>>>>>>> files,
>> >>>>>>>>> and I noticed each second
>> >>>>>>>>> the regionserver reports to master, it calls the memstore flush
>> and
>> >>>>>>>>> write a
>> >>>>>>>>> store file.
>> >>>>>>>>>
>> >>>>>>>>> the parameter hbase.regionserver.msginterval default value is 1
>> >>>>>>>>> second.
>> >>>>>>>>> I am
>> >>>>>>>>> thinking to change to 10 second.
>> >>>>>>>>> can that help ? I am also thinking to change
>> >>>>>>>>> hbase.hstore.blockingStoreFiles
>> >>>>>>>>> to 1000.  I noticed that there is a parameter
>> >>>>>>>>> hbase.hstore.blockingWaitTime with default value of 1.5 minutes.
>> as
>> >>>>>>>>> long as
>> >>>>>>>>> the 1.5 minutes is reached,
>> >>>>>>>>> the compaction is executed. I am fine with running compaction
>> every
>> >>>>>>>>> 1.5
>> >>>>>>>>> minutes, but running compaction every second
>> >>>>>>>>> and causing CPU consistently higher than 100% is not wanted.
>> >>>>>>>>>
>> >>>>>>>>> Any suggestion what kind of parameters to change to improve my
>> >>>>>>>>> writing
>> >>>>>>>>> speed
>> >>>>>>>>> ?
>> >>>>>>>>>
>> >>>>>>>>> Jimmy
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --------------------------------------------------
>> >>>>>>>>> From: "Ryan Rawson" <[email protected]>
>> >>>>>>>>> Sent: Wednesday, June 09, 2010 1:01 PM
>> >>>>>>>>> To: <[email protected]>
>> >>>>>>>>> Subject: Re: ideas to improve throughput of the base writting
>> >>>>>>>>>
>> >>>>>>>>>> The log will say something like "blocking updates to..." when
>> you
>> >>>>>>>>>> hit
>> >>>>>>>>>> a limit.  That log you indicate is just the regionserver
>> >>>>>>>>>> attempting
>> >>>>>>>>>> to
>> >>>>>>>>>> compact a region, but shouldn't prevent updates.
>> >>>>>>>>>>
>> >>>>>>>>>> what else does your logfile say?  Search for the string (case
>> >>>>>>>>>> insensitive) "blocking updates"...
>> >>>>>>>>>>
>> >>>>>>>>>> -ryan
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Jun 9, 2010 at 11:52 AM, Jinsong Hu
>> >>>>>>>>>> <[email protected]>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> I made this change
>> >>>>>>>>>>> <property>
>> >>>>>>>>>>>  <name>hbase.hstore.blockingStoreFiles</name>
>> >>>>>>>>>>>  <value>15</value>
>> >>>>>>>>>>> </property>
>> >>>>>>>>>>>
>> >>>>>>>>>>> the system is still slow.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Here is the most recent value for the region :
>> >>>>>>>>>>> stores=21, storefiles=186, storefileSizeMB=9681,
>> >>>>>>>>>>> memstoreSizeMB=128,
>> >>>>>>>>>>> storefileIndexSizeMB=12
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> And the same log still happens:
>> >>>>>>>>>>>
>> >>>>>>>>>>> 2010-06-09 18:36:40,577 WARN org.apache.h
>> >>>>>>>>>>> adoop.hbase.regionserver.MemStoreFlusher: Region
>> >>>>>>>>>>> SOME_ABCEventTable,2010-06-09 0
>> >>>>>>>>>>> 9:56:56\x093dc01b4d2c4872963717d80d8b5c74b1,1276107447570 has
>> too
>> >>>>>>>>>>> many
>> >>>>>>>>>>> store
>> >>>>>>>>>>> fil
>> >>>>>>>>>>> es, putting it back at the end of the flush queue.
>> >>>>>>>>>>>
>> >>>>>>>>>>> One idea that I have now is to further increase the
>> >>>>>>>>>>> hbase.hstore.blockingStoreFiles to a very high
>> >>>>>>>>>>> Number, such as 1000.  What is the negative impact of this
>> change
>> >>>>>>>>>>> ?
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Jimmy
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> --------------------------------------------------
>> >>>>>>>>>>> From: "Ryan Rawson" <[email protected]>
>> >>>>>>>>>>> Sent: Monday, June 07, 2010 3:58 PM
>> >>>>>>>>>>> To: <[email protected]>
>> >>>>>>>>>>> Subject: Re: ideas to improve throughput of the base writting
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Try setting this config value:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> <property>
>> >>>>>>>>>>>>  <name>hbase.hstore.blockingStoreFiles</name>
>> >>>>>>>>>>>>  <value>15</value>
>> >>>>>>>>>>>> </property>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> and see if that helps.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> The thing about the 1 compact thread is the scarce resources
>> >>>>>>>>>>>> being
>> >>>>>>>>>>>> preserved in this case is cluster IO.  People have had issues
>> >>>>>>>>>>>> with
>> >>>>>>>>>>>> compaction IO being too heavy.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> in your case, this setting can let the regionserver build up
>> >>>>>>>>>>>> more
>> >>>>>>>>>>>> store files without pausing your import.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> -ryan
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Mon, Jun 7, 2010 at 3:52 PM, Jinsong Hu
>> >>>>>>>>>>>> <[email protected]>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Hi,  There:
>> >>>>>>>>>>>>>  While saving lots of data to  on hbase, I noticed that the
>> >>>>>>>>>>>>> regionserver
>> >>>>>>>>>>>>> CPU
>> >>>>>>>>>>>>> went to more than 100%. examination shows that the hbase
>> >>>>>>>>>>>>> CompactSplit
>> >>>>>>>>>>>>> is
>> >>>>>>>>>>>>> spending full time working on compacting/splitting  hbase
>> store
>> >>>>>>>>>>>>> files.
>> >>>>>>>>>>>>> The
>> >>>>>>>>>>>>> machine I have is an 8 core machine. because there is only
>> one
>> >>>>>>>>>>>>> comact/split
>> >>>>>>>>>>>>> thread in hbase, only one core is fully used.
>> >>>>>>>>>>>>>  I continue to submit  map/reduce job to insert records to
>> >>>>>>>>>>>>> hbase.
>> >>>>>>>>>>>>> most
>> >>>>>>>>>>>>> of
>> >>>>>>>>>>>>> the time, the job runs very fast, around 1-5 minutes. But
>> >>>>>>>>>>>>> occasionally,
>> >>>>>>>>>>>>> it
>> >>>>>>>>>>>>> can take 2 hours. That is very bad to me. I highly suspect
>> that
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>> occasional slow insertion is related to the
>> >>>>>>>>>>>>> insufficient speed  compactsplit thread.
>> >>>>>>>>>>>>>  I am thinking that I should parallize the compactsplit
>> thread,
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>> code
>> >>>>>>>>>>>>> has
>> >>>>>>>>>>>>> this  : the for loop "for (Store store: stores.values())  "
>> can
>> >>>>>>>>>>>>> be
>> >>>>>>>>>>>>> parallized via java 5's threadpool , thus multiple cores are
>> >>>>>>>>>>>>> used
>> >>>>>>>>>>>>> instead
>> >>>>>>>>>>>>> only one core is used. I wonder if this will help to increase
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>> throughput.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  Somebody mentioned that I can increase the regionsize to
>> that
>> >>>>>>>>>>>>> I
>> >>>>>>>>>>>>> don't
>> >>>>>>>>>>>>> do
>> >>>>>>>>>>>>> so
>> >>>>>>>>>>>>> many compaction. Under heavy writing situation.
>> >>>>>>>>>>>>> does anybody have experience showing it helps ?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Jimmy.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  byte [] compactStores(final boolean majorCompaction)
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  throws IOException {
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  if (this.closing.get() || this.closed.get()) {
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  LOG.debug("Skipping compaction on " + this + " because
>> >>>>>>>>>>>>> closing/closed");
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  return null;
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  }
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  splitsAndClosesLock.readLock().lock();
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  try {
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  byte [] splitRow = null;
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  if (this.closed.get()) {
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  return splitRow;
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  }
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>  try {
>> >>>>>>>>>>>>>
>> >>
>> >
>>
>

Re: ideas to improve throughput of the base writting

Reply via email to