HBase : get(...) vs scan and in-memory table

Omkar Joshi Wed, 11 Sep 2013 02:21:35 -0700

I'm executing MR over HBase.
The business logic in the reducer heavily accesses two tables, say T1(40k rows) 
and T2(90k rows). Currently, I'm executing the following steps :
1.In the constructor of the reducer class, doing something like this :
HBaseCRUD hbaseCRUD = new HBaseCRUD();


HTableInterface t1= hbaseCRUD.getTable("T1",
                            "CF1", null, "C1", "C2");
HTableInterface t2= hbaseCRUD.getTable("T2",
                            "CF1", null, "C1", "C2");
In the reduce(...)
 String lowercase = ....;

/* Start : HBase code */
/*
* TRY using get(...) on the table rather than a
* Scan!
*/
Scan scan = new Scan();
scan.setStartRow(lowercase.getBytes());
scan.setStopRow(lowercase.getBytes());

/*scan will return a single row*/
ResultScanner resultScanner = t1.getScanner(scan);

for (Result result : resultScanner) {
/*business logic*/
}
Though not sure if the above code is sensible in first place, I have a question 
- would a get(...) provide any performance benefit over the scan?
Get get = new Get(lowercase.getBytes());
Result getResult = t1.get(get);
Since T1 and T2 will be read-only(mostly), I think if kept in-memory, the 
performance will improve. As per HBase doc., I will have to re-create the 
tables T1 and T2. Please verify the correctness of my understanding :
public void createTables(String tableName, boolean readOnly,
            boolean blockCacheEnabled, boolean inMemory,
            String... columnFamilyNames) throws IOException {
        // TODO Auto-generated method stub

        HTableDescriptor tableDesc = new HTableDescriptor(tableName);
        /* not sure !!! */
        tableDesc.setReadOnly(readOnly);

        HColumnDescriptor columnFamily = null;

        if (!(columnFamilyNames == null || columnFamilyNames.length == 0)) {

            for (String columnFamilyName : columnFamilyNames) {

                columnFamily = new HColumnDescriptor(columnFamilyName);
                /*
                 * Start : Do these steps ensure that the column
                 * family(actually, the column data) is in-memory???
                 */
                columnFamily.setBlockCacheEnabled(blockCacheEnabled);
                columnFamily.setInMemory(inMemory);
                /*
                 * End : Do these steps ensure that the column family(actually,
                 * the column data) is in-memory???
                 */

                tableDesc.addFamily(columnFamily);
            }
        }

        hbaseAdmin.createTable(tableDesc);
        hbaseAdmin.close();
    }
Once done :

 1.  How to verify that the columns are in-memory and accessed from there and 
not the disk?
 2.  Is the from-memory or from-disk read transparent to the client? In simple 
words, do I need to change the HTable access code in my reducer class? If yes, 
what are the changes?


Regards,
Omkar Joshi



________________________________
The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"

HBase : get(...) vs scan and in-memory table

Reply via email to