Thanks for the info Anil. I first tried a MR which did Put's, based on the
examples at [1] but this was much too slow, as you said. I switching to
writing HFiles directly via HFileOutputFormat solves the issue.

Also, I wanted to post an issue I ran into, in case anyone runs into it in
the future. For a table re-write doing a reduce can be bad, because the MR
framework will try to sort the whole table, potentially multiple TB. You
can avoid this by calling job.setNumReduceTasks(0). However, if you use
HFileOutputFormat.configureIncrementalLoad(), that call will also set up
the reducer, which may be a bit surprising (at least it was to me). So the
order matters:

    // This will have a (potentially long) reduce phase. Bad for large
tables.
    job.setNumReduceTasks(0);
    HFileOutputFormat.configureIncrementalLoad(job, hTable);  // Overrides
# of reduce tasks

Instead this works better for large tables:

    // This will skip reduce phase
    HFileOutputFormat.configureIncrementalLoad(job, hTable);
    job.setNumReduceTasks(0);

Followed by a major compaction that will do the sorting for locality.

[1] http://hbase.apache.org/0.94/book/mapreduce.example.html

On Tue, Feb 20, 2018 at 6:44 AM, anil gupta <anilgupt...@gmail.com> wrote:

> Hi Marcell,
>
> Since key is changing you will need to rewrite the entire table. I think
> generating HFlies(rather than doing puts) will be the most efficient here.
> IIRC, you will need to use HFileOutputFormat in your MR job.
> For locality, i dont think you should worry that much because major
> compaction usually takes care of it. If you want very high locality from
> beginning then you can run a major compaction on new table after your
> initial load.
>
> HTH,
> Anil Gupta
>
> On Mon, Feb 19, 2018 at 11:46 PM, Marcell Ortutay <mortu...@23andme.com>
> wrote:
>
> > I have a large HBase table (~10 TB) that has an existing key structure.
> > Based on some recent analysis, the key structure is causing performance
> > problems for our current query load. I would like to re-write the table
> > with a new key structure that performs substantially better.
> >
> > What is the best way to go about re-writing this table? Since they key
> > structure will change, it will affect locality, so all the data will have
> > to move to a new location. If anyone can point to examples of code that
> > does something like this, that would be very helpful.
> >
> > Thanks,
> > Marcell
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Reply via email to