Re: HBase : get(...) vs scan and in-memory table

Jean-Marc Spaggiari Wed, 11 Sep 2013 04:38:49 -0700

Hi Omkar,

Your tables T1 and T2 are not so big. are your 100% they can fit in memory?
If yes, then why did you not distribute them to all the nodes in your MR
setup, like on a map format, using distributed cache? Then on your map
code, you will be 100% sure that both tables are local and in memory...


JM


2013/9/11 Omkar Joshi <[email protected]>

> I'm executing MR over HBase.
> The business logic in the reducer heavily accesses two tables, say T1(40k
> rows) and T2(90k rows). Currently, I'm executing the following steps :
> 1.In the constructor of the reducer class, doing something like this :
> HBaseCRUD hbaseCRUD = new HBaseCRUD();
>
> HTableInterface t1= hbaseCRUD.getTable("T1",
>                             "CF1", null, "C1", "C2");
> HTableInterface t2= hbaseCRUD.getTable("T2",
>                             "CF1", null, "C1", "C2");
> In the reduce(...)
>  String lowercase = ....;
>
> /* Start : HBase code */
> /*
> * TRY using get(...) on the table rather than a
> * Scan!
> */
> Scan scan = new Scan();
> scan.setStartRow(lowercase.getBytes());
> scan.setStopRow(lowercase.getBytes());
>
> /*scan will return a single row*/
> ResultScanner resultScanner = t1.getScanner(scan);
>
> for (Result result : resultScanner) {
> /*business logic*/
> }
> Though not sure if the above code is sensible in first place, I have a
> question - would a get(...) provide any performance benefit over the scan?
> Get get = new Get(lowercase.getBytes());
> Result getResult = t1.get(get);
> Since T1 and T2 will be read-only(mostly), I think if kept in-memory, the
> performance will improve. As per HBase doc., I will have to re-create the
> tables T1 and T2. Please verify the correctness of my understanding :
> public void createTables(String tableName, boolean readOnly,
>             boolean blockCacheEnabled, boolean inMemory,
>             String... columnFamilyNames) throws IOException {
>         // TODO Auto-generated method stub
>
>         HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>         /* not sure !!! */
>         tableDesc.setReadOnly(readOnly);
>
>         HColumnDescriptor columnFamily = null;
>
>         if (!(columnFamilyNames == null || columnFamilyNames.length == 0))
> {
>
>             for (String columnFamilyName : columnFamilyNames) {
>
>                 columnFamily = new HColumnDescriptor(columnFamilyName);
>                 /*
>                  * Start : Do these steps ensure that the column
>                  * family(actually, the column data) is in-memory???
>                  */
>                 columnFamily.setBlockCacheEnabled(blockCacheEnabled);
>                 columnFamily.setInMemory(inMemory);
>                 /*
>                  * End : Do these steps ensure that the column
> family(actually,
>                  * the column data) is in-memory???
>                  */
>
>                 tableDesc.addFamily(columnFamily);
>             }
>         }
>
>         hbaseAdmin.createTable(tableDesc);
>         hbaseAdmin.close();
>     }
> Once done :
>
>  1.  How to verify that the columns are in-memory and accessed from there
> and not the disk?
>  2.  Is the from-memory or from-disk read transparent to the client? In
> simple words, do I need to change the HTable access code in my reducer
> class? If yes, what are the changes?
>
>
> Regards,
> Omkar Joshi
>
>
>
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>

Re: HBase : get(...) vs scan and in-memory table

Reply via email to