https://issues.apache.org/jira/browse/HBASE-2706 has been logged. Feel free to add comments there.
On Thu, Jun 10, 2010 at 7:33 AM, Stack <[email protected]> wrote: > On Wed, Jun 9, 2010 at 10:27 PM, Ted Yu <[email protected]> wrote: > > Is it possible for one or more of the parameters dynamic ? > > Meaning embedding tuning heuristic in HBase code. > > > > > This should be the goal for sure. Ideally, hbase would adjust as > the loading character changed. > St.Ack > > > > > > > On Wed, Jun 9, 2010 at 6:08 PM, Ryan Rawson <[email protected]> wrote: > > > >> One issue you may run into is that 0.20 doesn't have > >> https://issues.apache.org/jira/browse/HBASE-2066 > >> > >> a dev preview of 0.21 which does include that, and does improve > >> performance should be available soon. > >> > >> On Wed, Jun 9, 2010 at 5:55 PM, Jinsong Hu <[email protected]> > wrote: > >> > Yes, I have done all the suggestion of the > >> > http://wiki.apache.org/hadoop/PerformanceTuning. > >> > > >> > I just restarted the hbase cluster and recreated the table, the data > >> > insertion looks fine for now and > >> > I am getting about 1k record/second . I consider that to be reasonable > >> > giving that my record is about > >> > 10k bytes per record. but this is the beginning of the writing and I > >> notice > >> > that when the table is small, > >> > the hbase works fine. when there are lots of records in the table > >> already, > >> > problem begin to happen. > >> > I will report back and see how it goes after some more time. > >> > > >> > Jimmy. > >> > > >> > > >> > -------------------------------------------------- > >> > From: "Ryan Rawson" <[email protected]> > >> > Sent: Wednesday, June 09, 2010 5:20 PM > >> > To: <[email protected]> > >> > Subject: Re: ideas to improve throughput of the base writting > >> > > >> >> I am not familiar with that exception, I have not seen of it > before... > >> >> perhaps someone else has? > >> >> > >> >> And my 200k rows/sec is over 19 machines. It is the average over > many > >> >> hours. My calculation of row size might not match how much data was > >> >> flowing to disk, but I think it isn't too far off. > >> >> > >> >> Unfortunately comparing raw disk speed in a trivial benchmark (such > as > >> >> hdparm -t is) doesn't tell us how absolute speed of HBase must > >> >> perform. This is because HBase does much more work than a raw disk > >> >> write benchmark -- doing so to maintain structure and sorting. We > can > >> >> say that 'faster disks = faster HBase performance'. > >> >> > >> >> From the log lines you have pasted it sounds like the regionserver's > >> >> flush ability is not keeping up with your rate of data input. How > big > >> >> are your records? What is your target input speed? Have you done > >> >> anything on this page: > >> >> http://wiki.apache.org/hadoop/PerformanceTuning > >> >> > >> >> > >> >> > >> >> On Wed, Jun 9, 2010 at 4:58 PM, Jinsong Hu <[email protected]> > >> wrote: > >> >>> > >> >>> My hardware has 2 disks. I did a file copy on the machine and found > >> that > >> >>> I > >> >>> can get 300 mbyte/second. > >> >>> > >> >>> At this time, I see my insertion is less than 1k/second. my row size > is > >> . > >> >>> in > >> >>> terms of disk writing. my record > >> >>> insertion rate is far less than the hardware limit. my row size is > >> about > >> >>> 10K byte > >> >>> > >> >>> if in your i7-based server, you are doing 200k row/sec, each row is > 200 > >> >>> byte, then you are doing 40M byte/sec. > >> >>> > >> >>> in my case, if it behaves normally, I can get 100 row/sec * 10K byte > >> =1M > >> >>> /sec. > >> >>> that is far from the disk speed. occasionally I can see 1k > row/second. > >> >>> which > >> >>> is more reasonable in my case, > >> >>> but I rarely get that performance. > >> >>> > >> >>> even worse, with the change done, now I have seem lots of compaction > >> >>> failure: > >> >>> > >> >>> 2010-06-09 23:40:55,117 ERROR > >> >>> org.apache.hadoop.hbase.regionserver.CompactSplitT > >> >>> hread: Compaction failed for region Spam_MsgEventTable,2010-06-09 > >> >>> 20:05:20\x0905 > >> >>> 860d4bf1cb268ef69391cf97de9f64,1276121160527 > >> >>> java.lang.RuntimeException: java.io.IOException: Could not find > target > >> >>> position > >> >>> 65588 > >> >>> at > >> >>> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileS > >> >>> canner.java:61) > >> >>> at > >> >>> > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.j > >> >>> ava:79) > >> >>> at > >> >>> > org.apache.hadoop.hbase.regionserver.MinorCompactingStoreScanner.next > >> >>> (MinorCompactingStoreScanner.java:96) > >> >>> at > >> >>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:920) > >> >>> at > >> >>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:764) > >> >>> at > >> >>> > org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja > >> >>> va:832) > >> >>> at > >> >>> > org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja > >> >>> va:785) > >> >>> at > >> >>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp > >> >>> litThread.java:93) > >> >>> Caused by: java.io.IOException: Could not find target position 65588 > >> >>> at > >> >>> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockAt(DFSClien > >> >>> t.java:1556) > >> >>> at > >> >>> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient > >> >>> .java:1666) > >> >>> at > >> >>> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1 > >> >>> 780) > >> >>> at java.io.DataInputStream.read(DataInputStream.java:132) > >> >>> at > >> >>> > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(Bou > >> >>> ndedRangeFileInputStream.java:105) > >> >>> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100) > >> >>> at > >> >>> > org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1 > >> >>> 018) > >> >>> at > >> >>> > org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:96 > >> >>> 6) > >> >>> at > >> >>> > org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java > >> >>> :1159) > >> >>> at > >> >>> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileS > >> >>> canner.java:58) > >> >>> ... 7 more > >> >>> > >> >>> I can't stop this unless I restarted the regionserver. After restart > I > >> >>> truncate the table, and when I list the table again in shell, > >> >>> it appears 2 times. now I can't even disable the table and drop it. > >> >>> > >> >>> I will restart the whole hbase cluster and report the progress. > >> >>> > >> >>> Jimmy/ > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> -------------------------------------------------- > >> >>> From: "Ryan Rawson" <[email protected]> > >> >>> Sent: Wednesday, June 09, 2010 4:16 PM > >> >>> To: <[email protected]> > >> >>> Subject: Re: ideas to improve throughput of the base writting > >> >>> > >> >>>> Hey, > >> >>>> > >> >>>> Sounds like you are hitting limits of your hardware... I dont think > >> >>>> you mentioned the hardware spec you are running in this thread... > >> >>>> > >> >>>> What you are seeing is essentially the limits of HDFS's ability to > >> >>>> take writes. The errors might be due to various HDFS setup > problems > >> >>>> (eg: xceiver count, file handle count, all outlined in various > HBase > >> >>>> "startup" docs)... But the overall performance might be limited by > >> >>>> your hardware. > >> >>>> > >> >>>> For example, I use i7-based servers with 4 disks. This gives a > >> >>>> reasonable IO bandwidth, and can cope with high rates of inserts > (upto > >> >>>> 100-200k rows/sec (each row is ~ 100-300 bytes). If you are > running a > >> >>>> 1 or 2 disk system it is possible you are hitting limits of what > your > >> >>>> hardware can do. > >> >>>> > >> >>>> Also note that the write-pipeline performance is ultimately defined > in > >> >>>> bytes/sec, not just 'rows/sec'... thus my rows were small, and if > >> >>>> yours are big then you might be hitting a lower 'row/sec' limit > even > >> >>>> though the amount of bytes you are writing is higher than what i > might > >> >>>> have been doing. > >> >>>> > >> >>>> > >> >>>> > >> >>>> On Wed, Jun 9, 2010 at 3:59 PM, Jinsong Hu <[email protected] > > > >> >>>> wrote: > >> >>>>> > >> >>>>> I still get lots of repetition of > >> >>>>> > >> >>>>> 2010-06-09 22:54:38,428 WARN > >> >>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlushe > >> >>>>> r: Region Spam_MsgEventTable,2010-06-09 > >> >>>>> 20:05:20\x0905860d4bf1cb268ef69391cf97de > >> >>>>> 9f64,1276121160527 has too many store files, putting it back at > the > >> end > >> >>>>> of > >> >>>>> the f > >> >>>>> lush queue. > >> >>>>> 2010-06-09 22:54:38,428 DEBUG > >> >>>>> org.apache.hadoop.hbase.regionserver.CompactSplitT > >> >>>>> hread: Compaction requested for region > Spam_MsgEventTable,2010-06-09 > >> >>>>> 20:05:20\x0 > >> >>>>> 905860d4bf1cb268ef69391cf97de9f64,1276121160527/1537478401 > because: > >> >>>>> regionserver > >> >>>>> /10.110.8.88:60020.cacheFlusher > >> >>>>> > >> >>>>> > >> >>>>> I also saw lots of > >> >>>>> > >> >>>>> 2010-06-09 22:50:12,527 INFO > >> >>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>> Block > >> >>>>> ing updates for 'IPC Server handler 1 on 60020' on region > >> >>>>> Spam_MsgEventTable,201 > >> >>>>> 0-06-09 > 20:05:20\x0905860d4bf1cb268ef69391cf97de9f64,1276121160527: > >> >>>>> memstore > >> >>>>> siz > >> >>>>> e 512.0m is >= than blocking 512.0m size > >> >>>>> 2010-06-09 22:50:12,598 INFO > >> >>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>> Block > >> >>>>> ing updates for 'IPC Server handler 5 on 60020' on region > >> >>>>> Spam_MsgEventTable,201 > >> >>>>> 0-06-09 > 20:05:20\x0905860d4bf1cb268ef69391cf97de9f64,1276121160527: > >> >>>>> memstore > >> >>>>> siz > >> >>>>> e 512.0m is >= than blocking 512.0m size > >> >>>>> > >> >>>>> even with the changed config. the regionserver has 4G ram. what > else > >> >>>>> can > >> >>>>> be > >> >>>>> wrong ? > >> >>>>> > >> >>>>> The insertion rate is still not good. > >> >>>>> > >> >>>>> Jimmy. > >> >>>>> > >> >>>>> > >> >>>>> -------------------------------------------------- > >> >>>>> From: "Jinsong Hu" <[email protected]> > >> >>>>> Sent: Wednesday, June 09, 2010 1:59 PM > >> >>>>> To: <[email protected]> > >> >>>>> Subject: Re: ideas to improve throughput of the base writting > >> >>>>> > >> >>>>>> Thanks. I will make this change: > >> >>>>>> > >> >>>>>> <property> > >> >>>>>> <name>hbase.hregion.memstore.block.multiplier</name> > >> >>>>>> <value>8</value> > >> >>>>>> </property> > >> >>>>>> > >> >>>>>> <property> > >> >>>>>> <name>hbase.regionserver.msginterval</name> > >> >>>>>> <value>10000</value> > >> >>>>>> </property> > >> >>>>>> > >> >>>>>> <property> > >> >>>>>> <name>hbase.hstore.compactionThreshold</name> > >> >>>>>> <value>6</value> > >> >>>>>> </property> > >> >>>>>> > >> >>>>>> > >> >>>>>> <property> > >> >>>>>> <name>hbase.hstore.blockingStoreFiles</name> > >> >>>>>> <value>18</value> > >> >>>>>> </property> > >> >>>>>> > >> >>>>>> > >> >>>>>> and see how it goes. > >> >>>>>> > >> >>>>>> > >> >>>>>> Jimmy. > >> >>>>>> > >> >>>>>> -------------------------------------------------- > >> >>>>>> From: "Ryan Rawson" <[email protected]> > >> >>>>>> Sent: Wednesday, June 09, 2010 1:49 PM > >> >>>>>> To: <[email protected]> > >> >>>>>> Subject: Re: ideas to improve throughput of the base writting > >> >>>>>> > >> >>>>>>> More background here... you are running into a situation where > the > >> >>>>>>> regionserver cannot flush fast enough and the size of the > region's > >> >>>>>>> memstore has climbed too high and thus you get that error > message. > >> >>>>>>> HBase attempts to protect itself by holding up clients (thus > >> causing > >> >>>>>>> the low performance you see). By expanding how big a memstore > can > >> >>>>>>> get > >> >>>>>>> during times of stress you can improve performance, at the cost > of > >> >>>>>>> memory usage. That is what that setting is about. > >> >>>>>>> > >> >>>>>>> As for the 1.5 minute setting, that is the maximal amount of > time a > >> >>>>>>> handler thread will block for. You shouldn't need to tweak that > >> >>>>>>> value, and reducing it could cause issues. > >> >>>>>>> > >> >>>>>>> Now, as for compacting, HBase will compact small files into > larger > >> >>>>>>> files, and on a massive upload you can expect to see this happen > >> >>>>>>> constantly, thus tying up 1 cpu worth on your regionserver. You > >> >>>>>>> could > >> >>>>>>> potentially reduce that by increasing the value: > >> >>>>>>> > >> >>>>>>> <property> > >> >>>>>>> <name>hbase.hstore.compactionThreshold</name> > >> >>>>>>> <value>3</value> > >> >>>>>>> > >> >>>>>>> the value is interpreted as "if there are more than 3 files for > a > >> >>>>>>> region then run the compaction check". By raising this limit > you > >> can > >> >>>>>>> accumulate more files before compacting them, thus reducing the > >> >>>>>>> frequency of compactions but also potentially increasing the > >> >>>>>>> performance of reads (more files to read = more seeks = slower). > >> I'd > >> >>>>>>> consider setting it to 5-7 or so in concert with setting > >> >>>>>>> "hbase.hstore.blockingStoreFiles" to a value at least 2x that. > >> >>>>>>> > >> >>>>>>> All of these settings increase the amount of ram your > regionserver > >> >>>>>>> may > >> >>>>>>> need, so you will want to ensure you have at least 4000m of ram > set > >> >>>>>>> in > >> >>>>>>> hbase-env.sh. This is why they are set so conservatively in the > >> >>>>>>> default shipping config. > >> >>>>>>> > >> >>>>>>> These are the 3 important settings that control how often > >> compactions > >> >>>>>>> occur and how RPC threads get blocked. Try tweaking all of them > >> and > >> >>>>>>> let me know if you are doing better. > >> >>>>>>> > >> >>>>>>> -ryan > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Wed, Jun 9, 2010 at 1:37 PM, Ryan Rawson <[email protected] > > > >> >>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>> you also want this config: > >> >>>>>>>> > >> >>>>>>>> <property> > >> >>>>>>>> <name>hbase.hregion.memstore.block.multiplier</name> > >> >>>>>>>> <value>8</value> > >> >>>>>>>> </property> > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> that should hopefully clear things up. > >> >>>>>>>> > >> >>>>>>>> -ryan > >> >>>>>>>> > >> >>>>>>>> On Wed, Jun 9, 2010 at 1:34 PM, Jinsong Hu < > >> [email protected]> > >> >>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> I checked the log, there are lots of > >> >>>>>>>>> > >> >>>>>>>>> e 128.1m is >= than blocking 128.0m size > >> >>>>>>>>> 2010-06-09 17:26:36,736 INFO > >> >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>>>>>> Block > >> >>>>>>>>> ing updates for 'IPC Server handler 8 on 60020' on region > >> >>>>>>>>> Spam_MsgEventTable,201 > >> >>>>>>>>> 0-06-09 > >> 05:25:32\x09c873847edf6e5390477494956ec04729,1276104002262: > >> >>>>>>>>> memstore > >> >>>>>>>>> siz > >> >>>>>>>>> e 128.1m is >= than blocking 128.0m size > >> >>>>>>>>> > >> >>>>>>>>> then after that there are lots of > >> >>>>>>>>> > >> >>>>>>>>> 2010-06-09 17:26:36,800 DEBUG > >> >>>>>>>>> org.apache.hadoop.hbase.regionserver.Store: > >> >>>>>>>>> Added > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> hdfs:// > >> > namenodes1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/376337880/messag > >> >>>>>>>>> e_compound_terms/7606939244559826252, entries=30869, > >> >>>>>>>>> sequenceid=8350447892, > >> >>>>>>>>> mems > >> >>>>>>>>> ize=7.2m, filesize=3.4m to Spam_MsgEventTable,2010-06-09 > >> >>>>>>>>> 05:25:32\x09c873847edf6 > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> then lots of > >> >>>>>>>>> > >> >>>>>>>>> 2010-06-09 17:26:39,005 INFO > >> >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>>>>>> Unblo > >> >>>>>>>>> cking updates for region Spam_MsgEventTable,2010-06-09 > >> >>>>>>>>> 05:25:32\x09c873847edf6e5 > >> >>>>>>>>> 390477494956ec04729,1276104002262 'IPC Server handler 8 on > 60020' > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> This cycle happens again and again in the log. What can I do > in > >> >>>>>>>>> this > >> >>>>>>>>> case > >> >>>>>>>>> to speed up writing ? > >> >>>>>>>>> right now the writing speed is really slow. close to 4 > >> rows/second > >> >>>>>>>>> for > >> >>>>>>>>> a > >> >>>>>>>>> regionserver. > >> >>>>>>>>> > >> >>>>>>>>> I checked the code and try to find out why there are so many > >> store > >> >>>>>>>>> files, > >> >>>>>>>>> and I noticed each second > >> >>>>>>>>> the regionserver reports to master, it calls the memstore > flush > >> and > >> >>>>>>>>> write a > >> >>>>>>>>> store file. > >> >>>>>>>>> > >> >>>>>>>>> the parameter hbase.regionserver.msginterval default value is > 1 > >> >>>>>>>>> second. > >> >>>>>>>>> I am > >> >>>>>>>>> thinking to change to 10 second. > >> >>>>>>>>> can that help ? I am also thinking to change > >> >>>>>>>>> hbase.hstore.blockingStoreFiles > >> >>>>>>>>> to 1000. I noticed that there is a parameter > >> >>>>>>>>> hbase.hstore.blockingWaitTime with default value of 1.5 > minutes. > >> as > >> >>>>>>>>> long as > >> >>>>>>>>> the 1.5 minutes is reached, > >> >>>>>>>>> the compaction is executed. I am fine with running compaction > >> every > >> >>>>>>>>> 1.5 > >> >>>>>>>>> minutes, but running compaction every second > >> >>>>>>>>> and causing CPU consistently higher than 100% is not wanted. > >> >>>>>>>>> > >> >>>>>>>>> Any suggestion what kind of parameters to change to improve my > >> >>>>>>>>> writing > >> >>>>>>>>> speed > >> >>>>>>>>> ? > >> >>>>>>>>> > >> >>>>>>>>> Jimmy > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -------------------------------------------------- > >> >>>>>>>>> From: "Ryan Rawson" <[email protected]> > >> >>>>>>>>> Sent: Wednesday, June 09, 2010 1:01 PM > >> >>>>>>>>> To: <[email protected]> > >> >>>>>>>>> Subject: Re: ideas to improve throughput of the base writting > >> >>>>>>>>> > >> >>>>>>>>>> The log will say something like "blocking updates to..." when > >> you > >> >>>>>>>>>> hit > >> >>>>>>>>>> a limit. That log you indicate is just the regionserver > >> >>>>>>>>>> attempting > >> >>>>>>>>>> to > >> >>>>>>>>>> compact a region, but shouldn't prevent updates. > >> >>>>>>>>>> > >> >>>>>>>>>> what else does your logfile say? Search for the string (case > >> >>>>>>>>>> insensitive) "blocking updates"... > >> >>>>>>>>>> > >> >>>>>>>>>> -ryan > >> >>>>>>>>>> > >> >>>>>>>>>> On Wed, Jun 9, 2010 at 11:52 AM, Jinsong Hu > >> >>>>>>>>>> <[email protected]> > >> >>>>>>>>>> wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>> I made this change > >> >>>>>>>>>>> <property> > >> >>>>>>>>>>> <name>hbase.hstore.blockingStoreFiles</name> > >> >>>>>>>>>>> <value>15</value> > >> >>>>>>>>>>> </property> > >> >>>>>>>>>>> > >> >>>>>>>>>>> the system is still slow. > >> >>>>>>>>>>> > >> >>>>>>>>>>> Here is the most recent value for the region : > >> >>>>>>>>>>> stores=21, storefiles=186, storefileSizeMB=9681, > >> >>>>>>>>>>> memstoreSizeMB=128, > >> >>>>>>>>>>> storefileIndexSizeMB=12 > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> And the same log still happens: > >> >>>>>>>>>>> > >> >>>>>>>>>>> 2010-06-09 18:36:40,577 WARN org.apache.h > >> >>>>>>>>>>> adoop.hbase.regionserver.MemStoreFlusher: Region > >> >>>>>>>>>>> SOME_ABCEventTable,2010-06-09 0 > >> >>>>>>>>>>> 9:56:56\x093dc01b4d2c4872963717d80d8b5c74b1,1276107447570 > has > >> too > >> >>>>>>>>>>> many > >> >>>>>>>>>>> store > >> >>>>>>>>>>> fil > >> >>>>>>>>>>> es, putting it back at the end of the flush queue. > >> >>>>>>>>>>> > >> >>>>>>>>>>> One idea that I have now is to further increase the > >> >>>>>>>>>>> hbase.hstore.blockingStoreFiles to a very high > >> >>>>>>>>>>> Number, such as 1000. What is the negative impact of this > >> change > >> >>>>>>>>>>> ? > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> Jimmy > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> -------------------------------------------------- > >> >>>>>>>>>>> From: "Ryan Rawson" <[email protected]> > >> >>>>>>>>>>> Sent: Monday, June 07, 2010 3:58 PM > >> >>>>>>>>>>> To: <[email protected]> > >> >>>>>>>>>>> Subject: Re: ideas to improve throughput of the base > writting > >> >>>>>>>>>>> > >> >>>>>>>>>>>> Try setting this config value: > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> <property> > >> >>>>>>>>>>>> <name>hbase.hstore.blockingStoreFiles</name> > >> >>>>>>>>>>>> <value>15</value> > >> >>>>>>>>>>>> </property> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> and see if that helps. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> The thing about the 1 compact thread is the scarce > resources > >> >>>>>>>>>>>> being > >> >>>>>>>>>>>> preserved in this case is cluster IO. People have had > issues > >> >>>>>>>>>>>> with > >> >>>>>>>>>>>> compaction IO being too heavy. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> in your case, this setting can let the regionserver build > up > >> >>>>>>>>>>>> more > >> >>>>>>>>>>>> store files without pausing your import. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> -ryan > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> On Mon, Jun 7, 2010 at 3:52 PM, Jinsong Hu > >> >>>>>>>>>>>> <[email protected]> > >> >>>>>>>>>>>> wrote: > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Hi, There: > >> >>>>>>>>>>>>> While saving lots of data to on hbase, I noticed that > the > >> >>>>>>>>>>>>> regionserver > >> >>>>>>>>>>>>> CPU > >> >>>>>>>>>>>>> went to more than 100%. examination shows that the hbase > >> >>>>>>>>>>>>> CompactSplit > >> >>>>>>>>>>>>> is > >> >>>>>>>>>>>>> spending full time working on compacting/splitting hbase > >> store > >> >>>>>>>>>>>>> files. > >> >>>>>>>>>>>>> The > >> >>>>>>>>>>>>> machine I have is an 8 core machine. because there is only > >> one > >> >>>>>>>>>>>>> comact/split > >> >>>>>>>>>>>>> thread in hbase, only one core is fully used. > >> >>>>>>>>>>>>> I continue to submit map/reduce job to insert records to > >> >>>>>>>>>>>>> hbase. > >> >>>>>>>>>>>>> most > >> >>>>>>>>>>>>> of > >> >>>>>>>>>>>>> the time, the job runs very fast, around 1-5 minutes. But > >> >>>>>>>>>>>>> occasionally, > >> >>>>>>>>>>>>> it > >> >>>>>>>>>>>>> can take 2 hours. That is very bad to me. I highly suspect > >> that > >> >>>>>>>>>>>>> the > >> >>>>>>>>>>>>> occasional slow insertion is related to the > >> >>>>>>>>>>>>> insufficient speed compactsplit thread. > >> >>>>>>>>>>>>> I am thinking that I should parallize the compactsplit > >> thread, > >> >>>>>>>>>>>>> the > >> >>>>>>>>>>>>> code > >> >>>>>>>>>>>>> has > >> >>>>>>>>>>>>> this : the for loop "for (Store store: stores.values()) > " > >> can > >> >>>>>>>>>>>>> be > >> >>>>>>>>>>>>> parallized via java 5's threadpool , thus multiple cores > are > >> >>>>>>>>>>>>> used > >> >>>>>>>>>>>>> instead > >> >>>>>>>>>>>>> only one core is used. I wonder if this will help to > increase > >> >>>>>>>>>>>>> the > >> >>>>>>>>>>>>> throughput. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Somebody mentioned that I can increase the regionsize to > >> that > >> >>>>>>>>>>>>> I > >> >>>>>>>>>>>>> don't > >> >>>>>>>>>>>>> do > >> >>>>>>>>>>>>> so > >> >>>>>>>>>>>>> many compaction. Under heavy writing situation. > >> >>>>>>>>>>>>> does anybody have experience showing it helps ? > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Jimmy. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> byte [] compactStores(final boolean majorCompaction) > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> throws IOException { > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> if (this.closing.get() || this.closed.get()) { > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> LOG.debug("Skipping compaction on " + this + " because > >> >>>>>>>>>>>>> closing/closed"); > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> return null; > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> } > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> splitsAndClosesLock.readLock().lock(); > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> try { > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> byte [] splitRow = null; > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> if (this.closed.get()) { > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> return splitRow; > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> } > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> try { > >> >>>>>>>>>>>>> > >> >> > >> > > >> > > >
