I am not familiar with that exception, I have not seen of it before... perhaps someone else has?
And my 200k rows/sec is over 19 machines. It is the average over many hours. My calculation of row size might not match how much data was flowing to disk, but I think it isn't too far off. Unfortunately comparing raw disk speed in a trivial benchmark (such as hdparm -t is) doesn't tell us how absolute speed of HBase must perform. This is because HBase does much more work than a raw disk write benchmark -- doing so to maintain structure and sorting. We can say that 'faster disks = faster HBase performance'. >From the log lines you have pasted it sounds like the regionserver's flush ability is not keeping up with your rate of data input. How big are your records? What is your target input speed? Have you done anything on this page: http://wiki.apache.org/hadoop/PerformanceTuning On Wed, Jun 9, 2010 at 4:58 PM, Jinsong Hu <[email protected]> wrote: > My hardware has 2 disks. I did a file copy on the machine and found that I > can get 300 mbyte/second. > > At this time, I see my insertion is less than 1k/second. my row size is . in > terms of disk writing. my record > insertion rate is far less than the hardware limit. my row size is about > 10K byte > > if in your i7-based server, you are doing 200k row/sec, each row is 200 > byte, then you are doing 40M byte/sec. > > in my case, if it behaves normally, I can get 100 row/sec * 10K byte =1M > /sec. > that is far from the disk speed. occasionally I can see 1k row/second. which > is more reasonable in my case, > but I rarely get that performance. > > even worse, with the change done, now I have seem lots of compaction > failure: > > 2010-06-09 23:40:55,117 ERROR > org.apache.hadoop.hbase.regionserver.CompactSplitT > hread: Compaction failed for region Spam_MsgEventTable,2010-06-09 > 20:05:20\x0905 > 860d4bf1cb268ef69391cf97de9f64,1276121160527 > java.lang.RuntimeException: java.io.IOException: Could not find target > position > 65588 > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileS > canner.java:61) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.j > ava:79) > at > org.apache.hadoop.hbase.regionserver.MinorCompactingStoreScanner.next > (MinorCompactingStoreScanner.java:96) > at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:920) > at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:764) > at > org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja > va:832) > at > org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja > va:785) > at > org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp > litThread.java:93) > Caused by: java.io.IOException: Could not find target position 65588 > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockAt(DFSClien > t.java:1556) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient > .java:1666) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1 > 780) > at java.io.DataInputStream.read(DataInputStream.java:132) > at > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(Bou > ndedRangeFileInputStream.java:105) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100) > at > org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1 > 018) > at > org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:96 > 6) > at > org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java > :1159) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileS > canner.java:58) > ... 7 more > > I can't stop this unless I restarted the regionserver. After restart I > truncate the table, and when I list the table again in shell, > it appears 2 times. now I can't even disable the table and drop it. > > I will restart the whole hbase cluster and report the progress. > > Jimmy/ > > > > > > -------------------------------------------------- > From: "Ryan Rawson" <[email protected]> > Sent: Wednesday, June 09, 2010 4:16 PM > To: <[email protected]> > Subject: Re: ideas to improve throughput of the base writting > >> Hey, >> >> Sounds like you are hitting limits of your hardware... I dont think >> you mentioned the hardware spec you are running in this thread... >> >> What you are seeing is essentially the limits of HDFS's ability to >> take writes. The errors might be due to various HDFS setup problems >> (eg: xceiver count, file handle count, all outlined in various HBase >> "startup" docs)... But the overall performance might be limited by >> your hardware. >> >> For example, I use i7-based servers with 4 disks. This gives a >> reasonable IO bandwidth, and can cope with high rates of inserts (upto >> 100-200k rows/sec (each row is ~ 100-300 bytes). If you are running a >> 1 or 2 disk system it is possible you are hitting limits of what your >> hardware can do. >> >> Also note that the write-pipeline performance is ultimately defined in >> bytes/sec, not just 'rows/sec'... thus my rows were small, and if >> yours are big then you might be hitting a lower 'row/sec' limit even >> though the amount of bytes you are writing is higher than what i might >> have been doing. >> >> >> >> On Wed, Jun 9, 2010 at 3:59 PM, Jinsong Hu <[email protected]> wrote: >>> >>> I still get lots of repetition of >>> >>> 2010-06-09 22:54:38,428 WARN >>> org.apache.hadoop.hbase.regionserver.MemStoreFlushe >>> r: Region Spam_MsgEventTable,2010-06-09 >>> 20:05:20\x0905860d4bf1cb268ef69391cf97de >>> 9f64,1276121160527 has too many store files, putting it back at the end >>> of >>> the f >>> lush queue. >>> 2010-06-09 22:54:38,428 DEBUG >>> org.apache.hadoop.hbase.regionserver.CompactSplitT >>> hread: Compaction requested for region Spam_MsgEventTable,2010-06-09 >>> 20:05:20\x0 >>> 905860d4bf1cb268ef69391cf97de9f64,1276121160527/1537478401 because: >>> regionserver >>> /10.110.8.88:60020.cacheFlusher >>> >>> >>> I also saw lots of >>> >>> 2010-06-09 22:50:12,527 INFO >>> org.apache.hadoop.hbase.regionserver.HRegion: >>> Block >>> ing updates for 'IPC Server handler 1 on 60020' on region >>> Spam_MsgEventTable,201 >>> 0-06-09 20:05:20\x0905860d4bf1cb268ef69391cf97de9f64,1276121160527: >>> memstore >>> siz >>> e 512.0m is >= than blocking 512.0m size >>> 2010-06-09 22:50:12,598 INFO >>> org.apache.hadoop.hbase.regionserver.HRegion: >>> Block >>> ing updates for 'IPC Server handler 5 on 60020' on region >>> Spam_MsgEventTable,201 >>> 0-06-09 20:05:20\x0905860d4bf1cb268ef69391cf97de9f64,1276121160527: >>> memstore >>> siz >>> e 512.0m is >= than blocking 512.0m size >>> >>> even with the changed config. the regionserver has 4G ram. what else can >>> be >>> wrong ? >>> >>> The insertion rate is still not good. >>> >>> Jimmy. >>> >>> >>> -------------------------------------------------- >>> From: "Jinsong Hu" <[email protected]> >>> Sent: Wednesday, June 09, 2010 1:59 PM >>> To: <[email protected]> >>> Subject: Re: ideas to improve throughput of the base writting >>> >>>> Thanks. I will make this change: >>>> >>>> <property> >>>> <name>hbase.hregion.memstore.block.multiplier</name> >>>> <value>8</value> >>>> </property> >>>> >>>> <property> >>>> <name>hbase.regionserver.msginterval</name> >>>> <value>10000</value> >>>> </property> >>>> >>>> <property> >>>> <name>hbase.hstore.compactionThreshold</name> >>>> <value>6</value> >>>> </property> >>>> >>>> >>>> <property> >>>> <name>hbase.hstore.blockingStoreFiles</name> >>>> <value>18</value> >>>> </property> >>>> >>>> >>>> and see how it goes. >>>> >>>> >>>> Jimmy. >>>> >>>> -------------------------------------------------- >>>> From: "Ryan Rawson" <[email protected]> >>>> Sent: Wednesday, June 09, 2010 1:49 PM >>>> To: <[email protected]> >>>> Subject: Re: ideas to improve throughput of the base writting >>>> >>>>> More background here... you are running into a situation where the >>>>> regionserver cannot flush fast enough and the size of the region's >>>>> memstore has climbed too high and thus you get that error message. >>>>> HBase attempts to protect itself by holding up clients (thus causing >>>>> the low performance you see). By expanding how big a memstore can get >>>>> during times of stress you can improve performance, at the cost of >>>>> memory usage. That is what that setting is about. >>>>> >>>>> As for the 1.5 minute setting, that is the maximal amount of time a >>>>> handler thread will block for. You shouldn't need to tweak that >>>>> value, and reducing it could cause issues. >>>>> >>>>> Now, as for compacting, HBase will compact small files into larger >>>>> files, and on a massive upload you can expect to see this happen >>>>> constantly, thus tying up 1 cpu worth on your regionserver. You could >>>>> potentially reduce that by increasing the value: >>>>> >>>>> <property> >>>>> <name>hbase.hstore.compactionThreshold</name> >>>>> <value>3</value> >>>>> >>>>> the value is interpreted as "if there are more than 3 files for a >>>>> region then run the compaction check". By raising this limit you can >>>>> accumulate more files before compacting them, thus reducing the >>>>> frequency of compactions but also potentially increasing the >>>>> performance of reads (more files to read = more seeks = slower). I'd >>>>> consider setting it to 5-7 or so in concert with setting >>>>> "hbase.hstore.blockingStoreFiles" to a value at least 2x that. >>>>> >>>>> All of these settings increase the amount of ram your regionserver may >>>>> need, so you will want to ensure you have at least 4000m of ram set in >>>>> hbase-env.sh. This is why they are set so conservatively in the >>>>> default shipping config. >>>>> >>>>> These are the 3 important settings that control how often compactions >>>>> occur and how RPC threads get blocked. Try tweaking all of them and >>>>> let me know if you are doing better. >>>>> >>>>> -ryan >>>>> >>>>> >>>>> On Wed, Jun 9, 2010 at 1:37 PM, Ryan Rawson <[email protected]> wrote: >>>>>> >>>>>> you also want this config: >>>>>> >>>>>> <property> >>>>>> <name>hbase.hregion.memstore.block.multiplier</name> >>>>>> <value>8</value> >>>>>> </property> >>>>>> >>>>>> >>>>>> that should hopefully clear things up. >>>>>> >>>>>> -ryan >>>>>> >>>>>> On Wed, Jun 9, 2010 at 1:34 PM, Jinsong Hu <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> I checked the log, there are lots of >>>>>>> >>>>>>> e 128.1m is >= than blocking 128.0m size >>>>>>> 2010-06-09 17:26:36,736 INFO >>>>>>> org.apache.hadoop.hbase.regionserver.HRegion: >>>>>>> Block >>>>>>> ing updates for 'IPC Server handler 8 on 60020' on region >>>>>>> Spam_MsgEventTable,201 >>>>>>> 0-06-09 05:25:32\x09c873847edf6e5390477494956ec04729,1276104002262: >>>>>>> memstore >>>>>>> siz >>>>>>> e 128.1m is >= than blocking 128.0m size >>>>>>> >>>>>>> then after that there are lots of >>>>>>> >>>>>>> 2010-06-09 17:26:36,800 DEBUG >>>>>>> org.apache.hadoop.hbase.regionserver.Store: >>>>>>> Added >>>>>>> >>>>>>> >>>>>>> hdfs://namenodes1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/376337880/messag >>>>>>> e_compound_terms/7606939244559826252, entries=30869, >>>>>>> sequenceid=8350447892, >>>>>>> mems >>>>>>> ize=7.2m, filesize=3.4m to Spam_MsgEventTable,2010-06-09 >>>>>>> 05:25:32\x09c873847edf6 >>>>>>> >>>>>>> >>>>>>> then lots of >>>>>>> >>>>>>> 2010-06-09 17:26:39,005 INFO >>>>>>> org.apache.hadoop.hbase.regionserver.HRegion: >>>>>>> Unblo >>>>>>> cking updates for region Spam_MsgEventTable,2010-06-09 >>>>>>> 05:25:32\x09c873847edf6e5 >>>>>>> 390477494956ec04729,1276104002262 'IPC Server handler 8 on 60020' >>>>>>> >>>>>>> >>>>>>> >>>>>>> This cycle happens again and again in the log. What can I do in >>>>>>> this >>>>>>> case >>>>>>> to speed up writing ? >>>>>>> right now the writing speed is really slow. close to 4 rows/second >>>>>>> for >>>>>>> a >>>>>>> regionserver. >>>>>>> >>>>>>> I checked the code and try to find out why there are so many store >>>>>>> files, >>>>>>> and I noticed each second >>>>>>> the regionserver reports to master, it calls the memstore flush and >>>>>>> write a >>>>>>> store file. >>>>>>> >>>>>>> the parameter hbase.regionserver.msginterval default value is 1 >>>>>>> second. >>>>>>> I am >>>>>>> thinking to change to 10 second. >>>>>>> can that help ? I am also thinking to change >>>>>>> hbase.hstore.blockingStoreFiles >>>>>>> to 1000. I noticed that there is a parameter >>>>>>> hbase.hstore.blockingWaitTime with default value of 1.5 minutes. as >>>>>>> long as >>>>>>> the 1.5 minutes is reached, >>>>>>> the compaction is executed. I am fine with running compaction every >>>>>>> 1.5 >>>>>>> minutes, but running compaction every second >>>>>>> and causing CPU consistently higher than 100% is not wanted. >>>>>>> >>>>>>> Any suggestion what kind of parameters to change to improve my >>>>>>> writing >>>>>>> speed >>>>>>> ? >>>>>>> >>>>>>> Jimmy >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -------------------------------------------------- >>>>>>> From: "Ryan Rawson" <[email protected]> >>>>>>> Sent: Wednesday, June 09, 2010 1:01 PM >>>>>>> To: <[email protected]> >>>>>>> Subject: Re: ideas to improve throughput of the base writting >>>>>>> >>>>>>>> The log will say something like "blocking updates to..." when you >>>>>>>> hit >>>>>>>> a limit. That log you indicate is just the regionserver attempting >>>>>>>> to >>>>>>>> compact a region, but shouldn't prevent updates. >>>>>>>> >>>>>>>> what else does your logfile say? Search for the string (case >>>>>>>> insensitive) "blocking updates"... >>>>>>>> >>>>>>>> -ryan >>>>>>>> >>>>>>>> On Wed, Jun 9, 2010 at 11:52 AM, Jinsong Hu <[email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I made this change >>>>>>>>> <property> >>>>>>>>> <name>hbase.hstore.blockingStoreFiles</name> >>>>>>>>> <value>15</value> >>>>>>>>> </property> >>>>>>>>> >>>>>>>>> the system is still slow. >>>>>>>>> >>>>>>>>> Here is the most recent value for the region : >>>>>>>>> stores=21, storefiles=186, storefileSizeMB=9681, >>>>>>>>> memstoreSizeMB=128, >>>>>>>>> storefileIndexSizeMB=12 >>>>>>>>> >>>>>>>>> >>>>>>>>> And the same log still happens: >>>>>>>>> >>>>>>>>> 2010-06-09 18:36:40,577 WARN org.apache.h >>>>>>>>> adoop.hbase.regionserver.MemStoreFlusher: Region >>>>>>>>> SOME_ABCEventTable,2010-06-09 0 >>>>>>>>> 9:56:56\x093dc01b4d2c4872963717d80d8b5c74b1,1276107447570 has too >>>>>>>>> many >>>>>>>>> store >>>>>>>>> fil >>>>>>>>> es, putting it back at the end of the flush queue. >>>>>>>>> >>>>>>>>> One idea that I have now is to further increase the >>>>>>>>> hbase.hstore.blockingStoreFiles to a very high >>>>>>>>> Number, such as 1000. What is the negative impact of this change ? >>>>>>>>> >>>>>>>>> >>>>>>>>> Jimmy >>>>>>>>> >>>>>>>>> >>>>>>>>> -------------------------------------------------- >>>>>>>>> From: "Ryan Rawson" <[email protected]> >>>>>>>>> Sent: Monday, June 07, 2010 3:58 PM >>>>>>>>> To: <[email protected]> >>>>>>>>> Subject: Re: ideas to improve throughput of the base writting >>>>>>>>> >>>>>>>>>> Try setting this config value: >>>>>>>>>> >>>>>>>>>> <property> >>>>>>>>>> <name>hbase.hstore.blockingStoreFiles</name> >>>>>>>>>> <value>15</value> >>>>>>>>>> </property> >>>>>>>>>> >>>>>>>>>> and see if that helps. >>>>>>>>>> >>>>>>>>>> The thing about the 1 compact thread is the scarce resources being >>>>>>>>>> preserved in this case is cluster IO. People have had issues with >>>>>>>>>> compaction IO being too heavy. >>>>>>>>>> >>>>>>>>>> in your case, this setting can let the regionserver build up more >>>>>>>>>> store files without pausing your import. >>>>>>>>>> >>>>>>>>>> -ryan >>>>>>>>>> >>>>>>>>>> On Mon, Jun 7, 2010 at 3:52 PM, Jinsong Hu >>>>>>>>>> <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, There: >>>>>>>>>>> While saving lots of data to on hbase, I noticed that the >>>>>>>>>>> regionserver >>>>>>>>>>> CPU >>>>>>>>>>> went to more than 100%. examination shows that the hbase >>>>>>>>>>> CompactSplit >>>>>>>>>>> is >>>>>>>>>>> spending full time working on compacting/splitting hbase store >>>>>>>>>>> files. >>>>>>>>>>> The >>>>>>>>>>> machine I have is an 8 core machine. because there is only one >>>>>>>>>>> comact/split >>>>>>>>>>> thread in hbase, only one core is fully used. >>>>>>>>>>> I continue to submit map/reduce job to insert records to hbase. >>>>>>>>>>> most >>>>>>>>>>> of >>>>>>>>>>> the time, the job runs very fast, around 1-5 minutes. But >>>>>>>>>>> occasionally, >>>>>>>>>>> it >>>>>>>>>>> can take 2 hours. That is very bad to me. I highly suspect that >>>>>>>>>>> the >>>>>>>>>>> occasional slow insertion is related to the >>>>>>>>>>> insufficient speed compactsplit thread. >>>>>>>>>>> I am thinking that I should parallize the compactsplit thread, >>>>>>>>>>> the >>>>>>>>>>> code >>>>>>>>>>> has >>>>>>>>>>> this : the for loop "for (Store store: stores.values()) " can >>>>>>>>>>> be >>>>>>>>>>> parallized via java 5's threadpool , thus multiple cores are used >>>>>>>>>>> instead >>>>>>>>>>> only one core is used. I wonder if this will help to increase the >>>>>>>>>>> throughput. >>>>>>>>>>> >>>>>>>>>>> Somebody mentioned that I can increase the regionsize to that I >>>>>>>>>>> don't >>>>>>>>>>> do >>>>>>>>>>> so >>>>>>>>>>> many compaction. Under heavy writing situation. >>>>>>>>>>> does anybody have experience showing it helps ? >>>>>>>>>>> >>>>>>>>>>> Jimmy. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> byte [] compactStores(final boolean majorCompaction) >>>>>>>>>>> >>>>>>>>>>> throws IOException { >>>>>>>>>>> >>>>>>>>>>> if (this.closing.get() || this.closed.get()) { >>>>>>>>>>> >>>>>>>>>>> LOG.debug("Skipping compaction on " + this + " because >>>>>>>>>>> closing/closed"); >>>>>>>>>>> >>>>>>>>>>> return null; >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> splitsAndClosesLock.readLock().lock(); >>>>>>>>>>> >>>>>>>>>>> try { >>>>>>>>>>> >>>>>>>>>>> byte [] splitRow = null; >>>>>>>>>>> >>>>>>>>>>> if (this.closed.get()) { >>>>>>>>>>> >>>>>>>>>>> return splitRow; >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> try { >>>>>>>>>>> >>>>>>>>>>> synchronized (writestate) { >>>>>>>>>>> >>>>>>>>>>> if (!writestate.compacting && writestate.writesEnabled) { >>>>>>>>>>> >>>>>>>>>>> writestate.compacting = true; >>>>>>>>>>> >>>>>>>>>>> } else { >>>>>>>>>>> >>>>>>>>>>> LOG.info("NOT compacting region " + this + >>>>>>>>>>> >>>>>>>>>>> ": compacting=" + writestate.compacting + ", >>>>>>>>>>> writesEnabled=" >>>>>>>>>>> + >>>>>>>>>>> >>>>>>>>>>> writestate.writesEnabled); >>>>>>>>>>> >>>>>>>>>>> return splitRow; >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> LOG.info("Starting" + (majorCompaction? " major " : " ") + >>>>>>>>>>> >>>>>>>>>>> "compaction on region " + this); >>>>>>>>>>> >>>>>>>>>>> long startTime = System.currentTimeMillis(); >>>>>>>>>>> >>>>>>>>>>> doRegionCompactionPrep(); >>>>>>>>>>> >>>>>>>>>>> long maxSize = -1; >>>>>>>>>>> >>>>>>>>>>> for (Store store: stores.values()) { >>>>>>>>>>> >>>>>>>>>>> final Store.StoreSize ss = store.compact(majorCompaction); >>>>>>>>>>> >>>>>>>>>>> if (ss != null && ss.getSize() > maxSize) { >>>>>>>>>>> >>>>>>>>>>> maxSize = ss.getSize(); >>>>>>>>>>> >>>>>>>>>>> splitRow = ss.getSplitRow(); >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> doRegionCompactionCleanup(); >>>>>>>>>>> >>>>>>>>>>> String timeTaken = >>>>>>>>>>> StringUtils.formatTimeDiff(System.currentTimeMillis(), >>>>>>>>>>> >>>>>>>>>>> startTime); >>>>>>>>>>> >>>>>>>>>>> LOG.info("compaction completed on region " + this + " in " + >>>>>>>>>>> timeTaken); >>>>>>>>>>> >>>>>>>>>>> } finally { >>>>>>>>>>> >>>>>>>>>>> synchronized (writestate) { >>>>>>>>>>> >>>>>>>>>>> writestate.compacting = false; >>>>>>>>>>> >>>>>>>>>>> writestate.notifyAll(); >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> return splitRow; >>>>>>>>>>> >>>>>>>>>>> } finally { >>>>>>>>>>> >>>>>>>>>>> splitsAndClosesLock.readLock().unlock(); >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
