Re: How to efficiently join HBase tables?

Jason Rutherglen Wed, 01 Jun 2011 07:48:02 -0700

> you somehow need to flush all in-memory data *and* perform a
> major compaction


This makes sense.  Without compaction the linear HDFS scan isn't
possible.  I suppose one could compact 'offline' in a different Map
Reduce job.  However that would have it's own issues.

> The files do have a flag if they were made by a major compaction,
> so you scan only those and ignore the newer ones - but then you are trailing

This could be ok in many cases.  The key would be to create a sync'd
cut off point enabling a frozen point-in-time 'view' of the data.  I'm
not sure how that would be implemented.

On Wed, Jun 1, 2011 at 6:54 AM, Lars George <[email protected]> wrote:
> Hi Jason,
>
> This was discussed in the past, using the HFileInputFormat. The issue
> is that you somehow need to flush all in-memory data *and* perform a
> major compaction - or else you would need all the logic of the
> ColumnTracker in the HFIF. Since that needs to scan all storage files
> in parallel to achieve its job, the MR task would not really be able
> to use the same approach.
>
> Running a major compaction creates a lot of churn, so it is
> questionable what the outcome is. The files do have a flag if they
> were made by a major compaction, so you scan only those and ignore the
> newer ones - but then you are trailing, and you still do not handle
> delete markers/updates in newer files. No easy feat.
>
> Lars
>
> On Wed, Jun 1, 2011 at 2:41 AM, Jason Rutherglen
> <[email protected]> wrote:
>>> I'd imagine that join operations do not require realtime-ness, and so
>>> faster batch jobs using Hive -> frozen HBase files in HDFS could be
>>> the optimal way to go?
>>
>> In addition to lessening the load on the perhaps live RegionServer.
>> There's no Jira for this, I'm tempted to open one.
>>
>> On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen
>> <[email protected]> wrote:
>>>> The Hive-HBase integration allows you to create Hive tables that are backed
>>>> by HBase
>>>
>>> In addition, HBase can be made to go faster for MapReduce jobs, if the
>>> HFile's could be used directly in HDFS, rather than proxying through
>>> the RegionServer.
>>>
>>> I'd imagine that join operations do not require realtime-ness, and so
>>> faster batch jobs using Hive -> frozen HBase files in HDFS could be
>>> the optimal way to go?
>>>
>>> On Tue, May 31, 2011 at 1:41 PM, Patrick Angeles <[email protected]> 
>>> wrote:
>>>> On Tue, May 31, 2011 at 3:19 PM, Eran Kutner <[email protected]> wrote:
>>>>
>>>>> For my need I don't really need the general case, but even if I did I 
>>>>> think
>>>>> it can probably be done simpler.
>>>>> The main problem is getting the data from both tables into the same MR 
>>>>> job,
>>>>> without resorting to lookups. So without the theoretical
>>>>> MutliTableInputFormat, I could just copy all the data from both tables 
>>>>> into
>>>>> a temp table, just append the source table name to the row keys to make
>>>>> sure
>>>>> there are no conflicts. When all the data from both tables is in the same
>>>>> temp table, run a MR job. For each row the mapper should emit a key which
>>>>> is
>>>>> composed of all the values of the join fields in that row (the value can 
>>>>> be
>>>>> emitted as is). This will cause all the rows from both tables, with same
>>>>> join field values to arrive at the reducer together. The reducer could 
>>>>> then
>>>>> iterate over them and produce the Cartesian product as needed.
>>>>>
>>>>> I still don't like having to copy all the data into a temp table just
>>>>> because I can't feed two tables into the MR job.
>>>>>
>>>>
>>>> Loading the smaller table in memory is called a map join, versus a
>>>> reduce-side join (a.k.a. common join). One reason to prefer a map join is
>>>> you avoid the shuffle phase which potentially involves several trips to 
>>>> disk
>>>> for the intermediate records due to spills, and also once through the
>>>> network to get each intermediate KV pair to the right reducer. With a map
>>>> join, everything is local, except for the part where you load the small
>>>> table.
>>>>
>>>>
>>>>>
>>>>> As Jason Rutherglen mentioned above, Hive can do joins. I don't know if it
>>>>> can do them for HBase and it will not suit my needs, but it would be
>>>>> interesting to know how is it doing them, if anyone knows.
>>>>>
>>>>
>>>> The Hive-HBase integration allows you to create Hive tables that are backed
>>>> by HBase. You can do joins on those tables (and also with standard Hive
>>>> tables). It might be worth trying out in your case as it lets you easily 
>>>> see
>>>> the load characteristics and the job runtime without much coding 
>>>> investment.
>>>>
>>>> There are probably some specific optimizations that can be applied to your
>>>> situation, but it's hard to say without knowing your use-case.
>>>>
>>>> Regards,
>>>>
>>>> - Patrick
>>>>
>>>>
>>>>> -eran
>>>>>
>>>>>
>>>>>
>>>>> On Tue, May 31, 2011 at 22:02, Ted Dunning <[email protected]> wrote:
>>>>>
>>>>> > The Cartesian product often makes an honest-to-god join not such a good
>>>>> > idea
>>>>> > on large data.  The common alternative is co-group
>>>>> > which is basically like doing the hard work of the join, but involves
>>>>> > stopping just before emitting the cartesian product.  This allows
>>>>> > you to inject whatever cleverness you need at this point.
>>>>> >
>>>>> > Common kinds of cleverness include down-sampling of problematically 
>>>>> > large
>>>>> > sets of candidates.
>>>>> >
>>>>> > On Tue, May 31, 2011 at 11:56 AM, Michael Segel
>>>>> > <[email protected]>wrote:
>>>>> >
>>>>> > > So the underlying problem that the OP was trying to solve was how to
>>>>> join
>>>>> > > two tables from HBase.
>>>>> > > Unfortunately I goofed.
>>>>> > > I gave a quick and dirty solution that is a bit incomplete. They row
>>>>> key
>>>>> > in
>>>>> > > the temp table has to be unique and I forgot about the Cartesian
>>>>> > > product. So my solution wouldn't work in the general case.
>>>>> > >
>>>>> >
>>>>>
>>>>
>>>
>>
>

Re: How to efficiently join HBase tables?

Reply via email to