>From the code gave the link to: https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#L102
Hope this helps, J-D On Tue, Oct 4, 2011 at 7:20 AM, Christopher Dorner <[email protected]> wrote: > Thank you for the hint. > > What about autoflush then? Is that also something i can set using the config > on job setup? Or does it onyl work with an HTable instance? Somehow i can't > really find the right information :) > > Regards, > Christopher > > Am 03.10.2011 19:20, schrieb Jean-Daniel Cryans: >> >> Option a) and b) are the same since MultiTableOutputFormat internally >> uses multiple HTables. See for yourself: >> >> >> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java >> >> Also you can set the write buffer but setting >> hbase.client.write.buffer on the configuration that your pass in the >> job setup. >> >> Using HTablePool in a single threaded application doesn't offer more >> than just storage for your HTables. >> >> Hope that helps, >> >> J-D >> >> On Sat, Oct 1, 2011 at 4:05 AM, Christopher Dorner >> <[email protected]> wrote: >>> >>> Hallo, >>> >>> i am building a RDF Store using HBase and experimenting with different >>> index >>> tables and Schema Designs. >>> >>> For the input, i have a File where each line is a RDF triple in N3 >>> Format. >>> >>> I need to write to multiple Tables since i need to build several index >>> tables. For the sake of reducing IO and not reading the file a few times >>> i >>> want to do that in one Map-Only Job. Later the file will contain a few >>> million triples. >>> >>> I am experimenting in Pseudo-Distributed-Mode so far but will be able to >>> run >>> it on our cluster soon. >>> Storing the data in the Tables does not need to be speed-optimized at any >>> cost, but i just want to do it as simple and fast as possible. >>> >>> >>> What is the best way to write to more than 1 table in one Map-Task? >>> >>> a) >>> I can either use "MultiTableOutputFormat.class" and write in map() using: >>> Put put = new Put(key); >>> put.add(kv); >>> context.write(tableName, put); >>> >>> Can i write to e.g. 6 Tables in this way by creating a new Put for each >>> table? >>> >>> But how can i turn off autoFlush and set writeBufferSize in this case? >>> Because i think autoflush is not that good in this case of putting lots >>> of >>> values. >>> >>> >>> b) >>> I can use an instance of HTable in the Mapper class. Then i can set >>> autoFlush and writeBufferSize and write to the table using: >>> HTable table = new HTable(config, tableName); >>> table.put(put); >>> >>> But it is recommended to use only one instance of HTable, so i would need >>> to >>> do >>> table = new Table(); >>> for each table i want to write to. Is that still fine with 6 tables? >>> I stumbled upon HTablePool. Is this for these scenarios? >>> >>> >>> Thank You and Regards, >>> Christopher >>> > >
