Thank you for the hint.
What about autoflush then? Is that also something i can set using the
config on job setup? Or does it onyl work with an HTable instance?
Somehow i can't really find the right information :)
Regards,
Christopher
Am 03.10.2011 19:20, schrieb Jean-Daniel Cryans:
Option a) and b) are the same since MultiTableOutputFormat internally
uses multiple HTables. See for yourself:
https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java
Also you can set the write buffer but setting
hbase.client.write.buffer on the configuration that your pass in the
job setup.
Using HTablePool in a single threaded application doesn't offer more
than just storage for your HTables.
Hope that helps,
J-D
On Sat, Oct 1, 2011 at 4:05 AM, Christopher Dorner
<[email protected]> wrote:
Hallo,
i am building a RDF Store using HBase and experimenting with different index
tables and Schema Designs.
For the input, i have a File where each line is a RDF triple in N3 Format.
I need to write to multiple Tables since i need to build several index
tables. For the sake of reducing IO and not reading the file a few times i
want to do that in one Map-Only Job. Later the file will contain a few
million triples.
I am experimenting in Pseudo-Distributed-Mode so far but will be able to run
it on our cluster soon.
Storing the data in the Tables does not need to be speed-optimized at any
cost, but i just want to do it as simple and fast as possible.
What is the best way to write to more than 1 table in one Map-Task?
a)
I can either use "MultiTableOutputFormat.class" and write in map() using:
Put put = new Put(key);
put.add(kv);
context.write(tableName, put);
Can i write to e.g. 6 Tables in this way by creating a new Put for each
table?
But how can i turn off autoFlush and set writeBufferSize in this case?
Because i think autoflush is not that good in this case of putting lots of
values.
b)
I can use an instance of HTable in the Mapper class. Then i can set
autoFlush and writeBufferSize and write to the table using:
HTable table = new HTable(config, tableName);
table.put(put);
But it is recommended to use only one instance of HTable, so i would need to
do
table = new Table();
for each table i want to write to. Is that still fine with 6 tables?
I stumbled upon HTablePool. Is this for these scenarios?
Thank You and Regards,
Christopher