I'm executing MR over HBase.
The business logic in the reducer heavily accesses two tables, say T1(40k rows)
and T2(90k rows). Currently, I'm executing the following steps :
1.In the constructor of the reducer class, doing something like this :
HBaseCRUD hbaseCRUD = new HBaseCRUD();
HTableInterface t1= hbaseCRUD.getTable("T1",
"CF1", null, "C1", "C2");
HTableInterface t2= hbaseCRUD.getTable("T2",
"CF1", null, "C1", "C2");
In the reduce(...)
String lowercase = ....;
/* Start : HBase code */
/*
* TRY using get(...) on the table rather than a
* Scan!
*/
Scan scan = new Scan();
scan.setStartRow(lowercase.getBytes());
scan.setStopRow(lowercase.getBytes());
/*scan will return a single row*/
ResultScanner resultScanner = t1.getScanner(scan);
for (Result result : resultScanner) {
/*business logic*/
}
Though not sure if the above code is sensible in first place, I have a question
- would a get(...) provide any performance benefit over the scan?
Get get = new Get(lowercase.getBytes());
Result getResult = t1.get(get);
Since T1 and T2 will be read-only(mostly), I think if kept in-memory, the
performance will improve. As per HBase doc., I will have to re-create the
tables T1 and T2. Please verify the correctness of my understanding :
public void createTables(String tableName, boolean readOnly,
boolean blockCacheEnabled, boolean inMemory,
String... columnFamilyNames) throws IOException {
// TODO Auto-generated method stub
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
/* not sure !!! */
tableDesc.setReadOnly(readOnly);
HColumnDescriptor columnFamily = null;
if (!(columnFamilyNames == null || columnFamilyNames.length == 0)) {
for (String columnFamilyName : columnFamilyNames) {
columnFamily = new HColumnDescriptor(columnFamilyName);
/*
* Start : Do these steps ensure that the column
* family(actually, the column data) is in-memory???
*/
columnFamily.setBlockCacheEnabled(blockCacheEnabled);
columnFamily.setInMemory(inMemory);
/*
* End : Do these steps ensure that the column family(actually,
* the column data) is in-memory???
*/
tableDesc.addFamily(columnFamily);
}
}
hbaseAdmin.createTable(tableDesc);
hbaseAdmin.close();
}
Once done :
1. How to verify that the columns are in-memory and accessed from there and
not the disk?
2. Is the from-memory or from-disk read transparent to the client? In simple
words, do I need to change the HTable access code in my reducer class? If yes,
what are the changes?
Regards,
Omkar Joshi
________________________________
The contents of this e-mail and any attachment(s) may contain confidential or
privileged information for the intended recipient(s). Unintended recipients are
prohibited from taking action on the basis of information in this e-mail and
using or disseminating the information, and must notify the sender and delete
it from their system. L&T Infotech will not accept responsibility or liability
for the accuracy or completeness of, or the presence of any virus or disabling
code in this e-mail"