Re: HBase Performance Improvements?

Something Something Wed, 09 May 2012 08:09:23 -0700

Hey Oliver,

Thanks a "billion" for the response -:)  I will take any code you can
provide even if it's a hack!  I will even send you an Amazon gift card -
not that you care or need it -:)


Can you share some performance statistics?  Thanks again.


On Wed, May 9, 2012 at 8:02 AM, Oliver Meyn (GBIF) <[email protected]> wrote:

> Heya Something,
>
> I had a similar task recently and by far the best way to go about this is
> with bulk loading after pre-splitting your target table.  As you know
> ImportTsv doesn't understand Avro files so I hacked together my own
> ImportAvro class to create the Hfiles that I eventually moved into HBase
> with completebulkload.  I haven't committed my class anywhere because it's
> a pretty ugly hack, but I'm happy to share it with you as a starting point.
>  Doing billions of puts will just drive you crazy.
>
> Cheers,
> Oliver
>
> On 2012-05-09, at 4:51 PM, Something Something wrote:
>
> > I ran the following MR job that reads AVRO files & puts them on HBase.
>  The
> > files have tons of data (billions).  We have a fairly decent size
> cluster.
> > When I ran this MR job, it brought down HBase.  When I commented out the
> > Puts on HBase, the job completed in 45 seconds (yes that's seconds).
> >
> > Obviously, my HBase configuration is not ideal.  I am using all the
> default
> > HBase configurations that come out of Cloudera's distribution:
>  0.90.4+49.
> >
> > I am planning to read up on the following two:
> >
> > http://hbase.apache.org/book/important_configurations.html
> > http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/
> >
> > But can someone quickly take a look and recommend a list of priorities,
> > such as "try this first..."?  That would be greatly appreciated.  As
> > always, thanks for the time.
> >
> >
> > Here's the Mapper. (There's no reducer):
> >
> >
> >
> > public class AvroProfileMapper extends AvroMapper<GenericData.Record,
> > NullWritable> {
> >    private static final Logger logger =
> > LoggerFactory.getLogger(AvroProfileMapper.class);
> >
> >    final private String SEPARATOR = "*";
> >
> >    private HTable table;
> >
> >    private String datasetDate;
> >    private String tableName;
> >
> >    @Override
> >    public void configure(JobConf jobConf) {
> >        super.configure(jobConf);
> >        datasetDate = jobConf.get("datasetDate");
> >        tableName = jobConf.get("tableName");
> >
> >        // Open table for writing
> >        try {
> >            table = new HTable(jobConf, tableName);
> >            table.setAutoFlush(false);
> >            table.setWriteBufferSize(1024 * 1024 * 12);
> >        } catch (IOException e) {
> >            throw new RuntimeException("Failed table construction", e);
> >        }
> >    }
> >
> >    @Override
> >    public void map(GenericData.Record record, AvroCollector<NullWritable>
> > collector,
> >                    Reporter reporter) throws IOException {
> >
> >        String u1 = record.get("u1").toString();
> >
> >        GenericData.Array<GenericData.Record> fields =
> > (GenericData.Array<GenericData.Record>) record.get("bag");
> >        for (GenericData.Record rec : fields) {
> >            Integer s1 = (Integer) rec.get("s1");
> >            Integer n1 = (Integer) rec.get("n1");
> >            Integer c1 = (Integer) rec.get("c1");
> >            Integer freq = (Integer) rec.get("freq");
> >            if (freq == null) {
> >                freq = 0;
> >            }
> >
> >            String key = u1 + SEPARATOR + n1 + SEPARATOR + c1 + SEPARATOR
> +
> > s1;
> >            Put put = new Put(Bytes.toBytes(key));
> >            put.setWriteToWAL(false);
> >            put.add(Bytes.toBytes("info"), Bytes.toBytes("frequency"),
> > Bytes.toBytes(freq.toString()));
> >            try {
> >                table.put(put);
> >            } catch (IOException e) {
> >                throw new RuntimeException("Error while writing to " +
> > table + " table.", e);
> >            }
> >
> >        }
> >        logger.error("------------  Finished processing user: " + u1);
> >    }
> >
> >    @Override
> >    public void close() throws IOException {
> >        table.close();
> >    }
> >
> > }
>
>
> --
> Oliver Meyn
> Software Developer
> Global Biodiversity Information Facility (GBIF)
> +45 35 32 15 12
> http://www.gbif.org
>
>

Re: HBase Performance Improvements?

Reply via email to