Option a) and b) are the same since MultiTableOutputFormat internally uses multiple HTables. See for yourself:
https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java Also you can set the write buffer but setting hbase.client.write.buffer on the configuration that your pass in the job setup. Using HTablePool in a single threaded application doesn't offer more than just storage for your HTables. Hope that helps, J-D On Sat, Oct 1, 2011 at 4:05 AM, Christopher Dorner <[email protected]> wrote: > Hallo, > > i am building a RDF Store using HBase and experimenting with different index > tables and Schema Designs. > > For the input, i have a File where each line is a RDF triple in N3 Format. > > I need to write to multiple Tables since i need to build several index > tables. For the sake of reducing IO and not reading the file a few times i > want to do that in one Map-Only Job. Later the file will contain a few > million triples. > > I am experimenting in Pseudo-Distributed-Mode so far but will be able to run > it on our cluster soon. > Storing the data in the Tables does not need to be speed-optimized at any > cost, but i just want to do it as simple and fast as possible. > > > What is the best way to write to more than 1 table in one Map-Task? > > a) > I can either use "MultiTableOutputFormat.class" and write in map() using: > Put put = new Put(key); > put.add(kv); > context.write(tableName, put); > > Can i write to e.g. 6 Tables in this way by creating a new Put for each > table? > > But how can i turn off autoFlush and set writeBufferSize in this case? > Because i think autoflush is not that good in this case of putting lots of > values. > > > b) > I can use an instance of HTable in the Mapper class. Then i can set > autoFlush and writeBufferSize and write to the table using: > HTable table = new HTable(config, tableName); > table.put(put); > > But it is recommended to use only one instance of HTable, so i would need to > do > table = new Table(); > for each table i want to write to. Is that still fine with 6 tables? > I stumbled upon HTablePool. Is this for these scenarios? > > > Thank You and Regards, > Christopher >
