Re: Large Table Joins

Jacques Nadeau Fri, 13 Feb 2015 10:15:12 -0800

You've hit the nail on the head in terms of challenges.  The HDFS interface
doesn't provide an ability to specifically request certain data placement
strategies for a file.  While placing the workload on a particular node
will likely create the first replica of data on that node, the secondary
replicas could be placed on any node.  We're working on a generic way for
Drill to operate in this capacity but it isn't yet solved.  There is
actually a nice way to solve this in MapRFS so we're looking to expose
that.  For HDFS, someone would need to implement a custom block placement
strategy to accommodate this capability.  If you want to help look at this,
we could collaborate to try to expose this capability in Drill.


On Fri, Feb 13, 2015 at 9:03 AM, Uli Bethke <[email protected]> wrote:

> Thanks Aman. This answers my question. I suppose as a workaround for the
> time being I could denormalize the smaller of the large tables into the
> bigger one.
>
> I would also be interested in the opinions of the group on data
> co-locality (as implemented by Teradata). This is not so much a Drill
> question but rather related to the distributed file system.
>
> As far as I know, data co-locality of blocks can not be implemented in
> HDFS what is the situation in MapR-FS?
>
>
> On 13/02/2015 16:45, Aman Sinha wrote:
>
>> Drill joins (either hash join or merge join) currently generate 3 types of
>> plans:  hash distribute both sides of the join, hash distribute left side
>> and broadcast right side,  broadcast right side and don't distribute left
>> side.  These are cost-based decisions based on cardinalities.  However,
>> the
>> partitioning information of the table is not currently used because at
>> least for now the partitioning information of the table is not exposed to
>> Drill.
>>
>> This does not mean it cannot be added in the future.  Currently, Drill
>> does
>> not maintain a catalog or extensive metadata - it is designed for
>> schema-less data.  So in your use case it would end up re-distributing the
>> data.  But we have plans for exposing the distribution information to the
>> planner.
>>
>> On Fri, Feb 13, 2015 at 8:37 AM, Uli Bethke <[email protected]> wrote:
>>
>>  Thanks Jason.
>>>
>>> Just a bit more background on my question. Modern MPPs such as Teradata
>>> allow for full data co-locality via hash distribution of keys. This
>>> ensures
>>> that join data of two large tables will always end up on the same node
>>> and
>>> data co-locality is always ensured (no network overhead), which makes
>>> large
>>> table joins ultra fast. In Hive we have bucketing, which somewhat
>>> alleviates the problem. However, it does not guarantee data co-locality
>>> as
>>> two corresponding bucket files may end up on two different nodes. While
>>> it
>>> reduces network overhead it does not eliminate it.
>>>
>>>
>>>
>>> On 13/02/2015 16:25, Jason Altekruse wrote:
>>>
>>>  I don't think this actually answers your question. You can limit your
>>>> filters by directory to avoid reads from the filesystem, and some of the
>>>> storage plugins like Hbase and Hive implement scan level pushdown, but I
>>>> do
>>>> not know if this is sophisticated enough that a join would be aware of
>>>> the
>>>> partitioning. I'll keep watching the thread and reach out to our
>>>> planning
>>>> experts if they don't chime in.
>>>>
>>>> - Jason
>>>>
>>>> On Fri, Feb 13, 2015 at 5:45 AM, Carol McDonald <[email protected]
>>>> >
>>>> wrote:
>>>>
>>>>   yes you can read about it here
>>>>
>>>>> https://cwiki.apache.org/confluence/display/DRILL/Partition+Pruning
>>>>>
>>>>> On Fri, Feb 13, 2015 at 6:42 AM, Uli Bethke <[email protected]>
>>>>> wrote:
>>>>>
>>>>>   I have two large tables in Hive (both partitioned and bucketed).
>>>>> Using
>>>>> Map
>>>>>
>>>>>  side joins I can take advantage of data locality for the hash join
>>>>>> table.
>>>>>>
>>>>>> Using Drill does the optimizer take the partitioning and bucketing
>>>>>> into
>>>>>> consideration?
>>>>>> thanks
>>>>>> uli
>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>> ___________________________
>>> Uli Bethke
>>> Co-founder Sonra
>>> p: +353 86 32 83 040
>>> w: www.sonra.io
>>> l: linkedin.com/in/ulibethke
>>> t: twitter.com/ubethke
>>>
>>>
>>>
> --
> ___________________________
> Uli Bethke
> Co-founder Sonra
> p: +353 86 32 83 040
> w: www.sonra.io
> l: linkedin.com/in/ulibethke
> t: twitter.com/ubethke
>
>

Re: Large Table Joins

Reply via email to