Hi Omkar, Your tables T1 and T2 are not so big. are your 100% they can fit in memory? If yes, then why did you not distribute them to all the nodes in your MR setup, like on a map format, using distributed cache? Then on your map code, you will be 100% sure that both tables are local and in memory...
JM 2013/9/11 Omkar Joshi <[email protected]> > I'm executing MR over HBase. > The business logic in the reducer heavily accesses two tables, say T1(40k > rows) and T2(90k rows). Currently, I'm executing the following steps : > 1.In the constructor of the reducer class, doing something like this : > HBaseCRUD hbaseCRUD = new HBaseCRUD(); > > HTableInterface t1= hbaseCRUD.getTable("T1", > "CF1", null, "C1", "C2"); > HTableInterface t2= hbaseCRUD.getTable("T2", > "CF1", null, "C1", "C2"); > In the reduce(...) > String lowercase = ....; > > /* Start : HBase code */ > /* > * TRY using get(...) on the table rather than a > * Scan! > */ > Scan scan = new Scan(); > scan.setStartRow(lowercase.getBytes()); > scan.setStopRow(lowercase.getBytes()); > > /*scan will return a single row*/ > ResultScanner resultScanner = t1.getScanner(scan); > > for (Result result : resultScanner) { > /*business logic*/ > } > Though not sure if the above code is sensible in first place, I have a > question - would a get(...) provide any performance benefit over the scan? > Get get = new Get(lowercase.getBytes()); > Result getResult = t1.get(get); > Since T1 and T2 will be read-only(mostly), I think if kept in-memory, the > performance will improve. As per HBase doc., I will have to re-create the > tables T1 and T2. Please verify the correctness of my understanding : > public void createTables(String tableName, boolean readOnly, > boolean blockCacheEnabled, boolean inMemory, > String... columnFamilyNames) throws IOException { > // TODO Auto-generated method stub > > HTableDescriptor tableDesc = new HTableDescriptor(tableName); > /* not sure !!! */ > tableDesc.setReadOnly(readOnly); > > HColumnDescriptor columnFamily = null; > > if (!(columnFamilyNames == null || columnFamilyNames.length == 0)) > { > > for (String columnFamilyName : columnFamilyNames) { > > columnFamily = new HColumnDescriptor(columnFamilyName); > /* > * Start : Do these steps ensure that the column > * family(actually, the column data) is in-memory??? > */ > columnFamily.setBlockCacheEnabled(blockCacheEnabled); > columnFamily.setInMemory(inMemory); > /* > * End : Do these steps ensure that the column > family(actually, > * the column data) is in-memory??? > */ > > tableDesc.addFamily(columnFamily); > } > } > > hbaseAdmin.createTable(tableDesc); > hbaseAdmin.close(); > } > Once done : > > 1. How to verify that the columns are in-memory and accessed from there > and not the disk? > 2. Is the from-memory or from-disk read transparent to the client? In > simple words, do I need to change the HTable access code in my reducer > class? If yes, what are the changes? > > > Regards, > Omkar Joshi > > > > ________________________________ > The contents of this e-mail and any attachment(s) may contain confidential > or privileged information for the intended recipient(s). Unintended > recipients are prohibited from taking action on the basis of information in > this e-mail and using or disseminating the information, and must notify the > sender and delete it from their system. L&T Infotech will not accept > responsibility or liability for the accuracy or completeness of, or the > presence of any virus or disabling code in this e-mail" >
