Hi Prabeesh,

The underlying Beam primitive you use for Join is CoGroupByKey – this takes
N different collections KV<K, V1> , KV<K, V2> , ... K<K, VN> and produces
one collection KV<K, [Iterable<V1>, Iterable<V2>, ..., Iterable<VN>]>. This
is a compressed representation of a Join result, in that you can expand it
to a full outer join, you can implement inner join, and you can implement
lots of other join algorithms.

There is also a Join library that does this under the hood:
https://github.com/apache/beam/tree/master/sdks/java/extensions/join-library


Dan

On Wed, May 3, 2017 at 6:30 AM, Prabeesh K. <[email protected]> wrote:

> Hi Dan,
>
> Sorry for the late response.
>
> I agreed with you for the use cases that you mentioned.
>
> Advice me and please share if there is any sample code to join two data
> sets in Beam that are sharing some common keys.
>
> Regards,
> Prabeesh K.
>
> On 6 February 2017 at 10:38, Dan Halperin <[email protected]> wrote:
>
>> Definitely, using BigQuery for what BigQuery is really good at (big scans
>> and cost-based joins) is nearly always a good idea. A strong endorsement of
>> Ankur's answer.
>>
>> Pushing the right amount of work into a database is an art, however --
>> there are some scenarios where you'd rather scan in BQ and join in Beam
>> because the join result is very large and you can better filter it in Beam,
>> or because you need to do some pre-join-filtering based on an external API
>> call (and you don't want to load the results of that API call into
>> BigQuery)...
>>
>> I've only seen a few, rare, cases of the latter.
>>
>> Thanks,
>> Dan
>>
>> On Sun, Feb 5, 2017 at 9:19 PM, Prabeesh K. <[email protected]> wrote:
>>
>>> Hi Ankur,
>>>
>>> Thank you for your response.
>>>
>>> On 5 February 2017 at 23:59, Ankur Chauhan <[email protected]> wrote:
>>>
>>>> I have found doing joins in bigquery using sql is a lot faster and
>>>> easier to iterate upon.
>>>>
>>>>
>>>> Ankur Chauhan
>>>> On Sat, Feb 4, 2017 at 22:05 Prabeesh K. <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Which is the better way to join two tables in apache beam?
>>>>>
>>>>> Regards,
>>>>> Prabeesh K.
>>>>>
>>>>
>>>
>>
>

Reply via email to