ShivaKumar SS created HBASE-20056:
-------------------------------------

             Summary: Performance optimization on 
MultiTableInputFormatBase#getSplits() 
                 Key: HBASE-20056
                 URL: https://issues.apache.org/jira/browse/HBASE-20056
             Project: HBase
          Issue Type: Improvement
          Components: hbase, mapreduce
    Affects Versions: 1.0.1
            Reporter: ShivaKumar SS


Currently this method iterates the List of scan objects to get splits and for 
each iteration it opens the Connection object and closes it, which is heavy.

It can be optimzed such that a single hbase connection can be used for all the 
scan objects for their splits computation.

This optimization will help in reducing the launch time for MR Job.

We are using a cluster of 15 nodes, and we have around 120~ scan objects. it 
takes 5~ mins to launch a job. on the optimized code, it takes < 30 ~ secs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to