[ https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181572#comment-13181572 ]
Zhihong Yu edited comment on HBASE-5140 at 1/6/12 8:40 PM: ----------------------------------------------------------- I suggest utilizing this method in HTable: {code} public Pair<byte[][],byte[][]> getStartEndKeys() throws IOException { {code} i.e. start and end keys returned by the above method are passed to the splitter interface. was (Author: zhi...@ebaysf.com): I suggest utilizing this method in HTable: {code} public Pair<byte[][],byte[][]> getStartEndKeys() throws IOException { {code} i.e. start and end keys are passed to the splitter interface. > TableInputFormat subclass to allow N number of splits per region during MR > jobs > ------------------------------------------------------------------------------- > > Key: HBASE-5140 > URL: https://issues.apache.org/jira/browse/HBASE-5140 > Project: HBase > Issue Type: New Feature > Components: mapreduce > Reporter: Josh Wymer > Priority: Trivial > Original Estimate: 72h > Remaining Estimate: 72h > > In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I > am working on a subclass for the TableInputFormat class that overrides > getSplits in order to generate N number of splits per regions and/or N number > of splits per job. The idea is to convert the startKey and endKey for each > region from byte[] to BigDecimal, take the difference, divide by N, convert > back to byte[] and generate splits on the resulting values. Assuming your > keys are fully distributed this should generate splits at nearly the same > number of rows per split. Any suggestions on this issue are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira