Usually the logs are pretty chatty about what's blocking them, here's one example of my going through my own logs: http://search-hadoop.com/m/fJ0vh6ojHm1
J-D On Tue, Mar 22, 2011 at 4:18 AM, Ferdy Galema <[email protected]> wrote: > HBase already makes my life better, so no worries there :) > > I agree the topic of this thread is not clear anymore. I also already know > how to tackle my problem. So just for the record let me explain what I was > thinking/doing: > > The original intend was to clean up my HBase installation (remove floating > regions and storefiles). We have had some crashes in the past and therefore > there were still some minor inconsistencies. I had never ran the hbck tool, > in fact I was not aware of it. A second intend was to decrease the number of > regions. > > However, I wrongly decided that the best way to do this is by doing an > export and a consecutive import on a clean dataset. This way I could avoid > the process of digging into the data files and merging the regions manually. > Of course it would work if I tuned the (import) performance parameters > better or simply accepted to wait for a long time for the import to finish. > So my first posting was about these performance issues. After that, I > quickly turned to manually cleaning/merging regions. This worked. > > So although my initial problems were solved, I was still a bit concerned. I > know that importing is more expensive than exporting, but I did not expect > see that big a difference in the order of magnitude. I thought there might > as well be something terribly wrong with my configuration, or my assumptions > about the way the clients/regionservers can be tuned in order to increase > bulkloading performance. For example, the assumption that increasing the > hbase.store.compactionThreshold and hbase.store.blockingStoreFiles to > excessive amounts will completely disable minor compactions. (By the way, > I'm still not sure if it does and if it's smart to do that when importing). > > Ferdy. > > On 03/22/2011 02:22 AM, Jean-Daniel Cryans wrote: >> >> I feel like I'm not understanding your need correctly, could you >> elicit what you think HBase you should be doing in order to give you a >> better life? >> >> Thx, >> >> J-D >> >> On Mon, Mar 21, 2011 at 5:22 PM, Ferdy Galema<[email protected]> >> wrote: >>> >>> These methods are certainly helpful, whenever I ever need to do a heavy >>> import. For now I got away with manually cleaning my regions/stores and >>> merging the data. I thought importing/exporting was the easy way to do >>> that, >>> but I guess that's not (yet) true. >>> >>> On 03/21/2011 09:48 PM, Jean-Daniel Cryans wrote: >>>> >>>> What you are describing is solved usually by either: >>>> >>>> - pre-creating the regions so that you don't have to go through the >>>> "growing pains" of a new, virgin table. Use this sort of method: >>>> >>>> >>>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor, >>>> byte[][]) >>>> >>>> - use the bulk loader: http://hbase.apache.org/bulk-loads.html >>>> >>>> J-D >>>> >>>> On Fri, Mar 18, 2011 at 5:46 AM, Ferdy Galema<[email protected]> >>>> wrote: >>>>> >>>>> On second thought, removing the obsolete regionfolders was easily done >>>>> by >>>>> hand. This way I can merge regions with the merge tool. >>>>> >>>>> However, I'm still bothered by the (performance) issues I ran into. Any >>>>> advice would be helpful. >>>>> >>>>> On 03/18/2011 11:06 AM, Ferdy Galema wrote: >>>>>> >>>>>> After exporting a tabel of about 30M rows (each row has about 500 >>>>>> columns, >>>>>> totalling 400GB of data), there were several issues when trying to >>>>>> import it >>>>>> again on an empty HBase. (HBase version is 0.90.1-CDH3B4, deployed on >>>>>> 15 >>>>>> nodes. LZO is enabled.) >>>>>> >>>>>> The reason for this export/import is to both reduce the number of >>>>>> regions >>>>>> and clean up regionfolders in the table that are no longer referred >>>>>> to. >>>>>> (I >>>>>> can see this because of the dfs timestamps). Btw, I'm aware of the >>>>>> Merge >>>>>> tool, which can only solve the merging part. The max region size is >>>>>> set >>>>>> to >>>>>> 1GB, which is not an uncommon number judging by other posts >>>>>> considering >>>>>> a >>>>>> big data set. >>>>>> >>>>>> To eliminate some of the write bottlenecks, I already disabled writing >>>>>> to >>>>>> the WAL by modifying the import tool. (I assume writing to the WAL is >>>>>> not >>>>>> necessary during import as long no regionservers crash. If one does, I >>>>>> can >>>>>> simply recreate an empty hbase and start over.) >>>>>> >>>>>> Also, I temporarily set hbase.hstore.compactionThreshold and >>>>>> hbase.hstore.blockingStoreFiles excessively high in order to disable >>>>>> minor >>>>>> compactions during the time of the import. With these changes it still >>>>>> takes >>>>>> about 100 hour to import the data, opposed to the 6 hour it took to >>>>>> read >>>>>> it. >>>>>> The importing starts with a single region on one node, and is split >>>>>> when >>>>>> the >>>>>> size is exceeded. The resulting regions are spread out over the other >>>>>> nodes, >>>>>> so that not a problem. The first tasks result in regionservers >>>>>> sometimes >>>>>> blocking updates because there flushing memstores. After a while >>>>>> (around >>>>>> 10% >>>>>> completion of the job) the logs mostly show the "LRU Stats", and >>>>>> sometimes >>>>>> "Updating" / "Opening" statements. Although I presumely disabled minor >>>>>> compactions and no major compact should be running yet, sometimes I >>>>>> also >>>>>> see >>>>>> Compacting statements. Why is that so? In other words, what does >>>>>> "because >>>>>> Region has references on open" mean? >>>>>> >>>>>> Aside of these performance issues, tasks are failing with region >>>>>> offline >>>>>> errors. These are always regions that were just split. The map/reduce >>>>>> framework tolerates these errors, still I thought splitting process >>>>>> was >>>>>> transparant to the user. >>>>>> >>>>>> Please correct me if I'm wrong in any of my assumptions. >>>>>> >>>>>> Ferdy. >
