Unfortunately at the moment I have to make everything run in standalone mode on a 8 cores machine with 16 GB of RAM. The good news is that the "mapreduce" side is able to terminate thanks to it's Apache Flink implementation that manages efficiently the memory (and if there's not enough memory it will serialize things on the disk). My HBase version is 0.98.6.1-hadoop2 with default settings except for:
export HBASE_OPTS="-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled" export HBASE_HEAPSIZE=2000 The load of my pc is quite high because I use Flink (like Spark) to write data into it using TableOutputFormat with all 8 cores but the data I'm trying to add is not so big (about 6GB). The problem is that at some point HBase server stop responding without logging anything. Do you think there's any possibility to avoid bulk uploading in my case? I was thinking to use it because once produced the HFiles there's should be only the HBase import process running and I don't have to bother the server with WAL management and GC stuff.. Thanks for the support, Flavio On Wed, Apr 8, 2015 at 5:00 PM, Ted Yu <[email protected]> wrote: > You may have read http://hbase.apache.org/book.html#arch.bulk.load > > bq. using TableOutputFormat from client makes my HBase stop working > > Can you give us more information (hbase release, load on your cluster, log > snippet for crashed server, etc) ? > > Thanks > > On Wed, Apr 8, 2015 at 7:52 AM, Flavio Pompermaier <[email protected]> > wrote: > > > Hi all, > > > > I have a non-mapreduce process that produce a lot of data that I want to > > import into HBase through programmatically bulk loading because using > > TableOutputFormat from client makes my HBase stop working (too many > writes > > in parallel I think). > > > > How can I create the necessary HFiles starting from my data (their name, > > size, etc) and then bulk load them into HBase programmatically? Do I need > > to use the ProtobufUtil.bulkLoadHFile or SecureBulkLoadClient, right? > > > > Best, > > Flavio > > >
