One other option... Your map() method has a null writeable and you handle the put() to the table(s) yourself within the map() method. You can also set the autoflush within your job too.
> Date: Tue, 4 Oct 2011 16:20:25 +0200 > From: [email protected] > To: [email protected] > Subject: Re: Best way to write to multiple tables in one map-only job > > Thank you for the hint. > > What about autoflush then? Is that also something i can set using the > config on job setup? Or does it onyl work with an HTable instance? > Somehow i can't really find the right information :) > > Regards, > Christopher > > Am 03.10.2011 19:20, schrieb Jean-Daniel Cryans: > > Option a) and b) are the same since MultiTableOutputFormat internally > > uses multiple HTables. See for yourself: > > > > https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java > > > > Also you can set the write buffer but setting > > hbase.client.write.buffer on the configuration that your pass in the > > job setup. > > > > Using HTablePool in a single threaded application doesn't offer more > > than just storage for your HTables. > > > > Hope that helps, > > > > J-D > > > > On Sat, Oct 1, 2011 at 4:05 AM, Christopher Dorner > > <[email protected]> wrote: > >> Hallo, > >> > >> i am building a RDF Store using HBase and experimenting with different > >> index > >> tables and Schema Designs. > >> > >> For the input, i have a File where each line is a RDF triple in N3 Format. > >> > >> I need to write to multiple Tables since i need to build several index > >> tables. For the sake of reducing IO and not reading the file a few times i > >> want to do that in one Map-Only Job. Later the file will contain a few > >> million triples. > >> > >> I am experimenting in Pseudo-Distributed-Mode so far but will be able to > >> run > >> it on our cluster soon. > >> Storing the data in the Tables does not need to be speed-optimized at any > >> cost, but i just want to do it as simple and fast as possible. > >> > >> > >> What is the best way to write to more than 1 table in one Map-Task? > >> > >> a) > >> I can either use "MultiTableOutputFormat.class" and write in map() using: > >> Put put = new Put(key); > >> put.add(kv); > >> context.write(tableName, put); > >> > >> Can i write to e.g. 6 Tables in this way by creating a new Put for each > >> table? > >> > >> But how can i turn off autoFlush and set writeBufferSize in this case? > >> Because i think autoflush is not that good in this case of putting lots of > >> values. > >> > >> > >> b) > >> I can use an instance of HTable in the Mapper class. Then i can set > >> autoFlush and writeBufferSize and write to the table using: > >> HTable table = new HTable(config, tableName); > >> table.put(put); > >> > >> But it is recommended to use only one instance of HTable, so i would need > >> to > >> do > >> table = new Table(); > >> for each table i want to write to. Is that still fine with 6 tables? > >> I stumbled upon HTablePool. Is this for these scenarios? > >> > >> > >> Thank You and Regards, > >> Christopher > >> >
