Hi Serega, Bulk load just "push" the file into an HBase region, so there should not be any issue. Split however might take some time because HBase will have to split it again and again util it become small enough. So if you max file size is 10GB, it will split it to 100GB then 50GB then 25GB then 12GB then 6GB... Each time, everything will be re-written. a LOT of wasted IOs.
So response is: Yes, HBase can handle BUT it's not a good practice. Better to split the table before and generate the bulk based on the splited regions. Also, it might affect the others tables and the performances because HBase will have to do massive IOs, which at the end might impact the performances. JM 2014-10-02 15:03 GMT-04:00 Serega Sheypak <[email protected]>: > Hi, I'm doing HBase bulk load to an empty table. > Input data size is 200GB > Is it OK to load data into one default region and then wait while HBase > splits 200GB region? > > I don't have any SLA for initial load. I can wait unitl HBase splits > initial load files. > This table is READ only. > > The only conideration is not affect others tables and do not cause HBase > cluster degradation. >
