Hi Dan

Thank you for your prompt reply.

Regards,
Prabeesh K.

On 3 May 2017 at 19:23, Dan Halperin <[email protected]> wrote:

> Hi Prabeesh,
>
> The underlying Beam primitive you use for Join is CoGroupByKey – this
> takes N different collections KV<K, V1> , KV<K, V2> , ... K<K, VN> and
> produces one collection KV<K, [Iterable<V1>, Iterable<V2>, ...,
> Iterable<VN>]>. This is a compressed representation of a Join result, in
> that you can expand it to a full outer join, you can implement inner join,
> and you can implement lots of other join algorithms.
>
> There is also a Join library that does this under the hood:
> https://github.com/apache/beam/tree/master/sdks/
> java/extensions/join-library
>
> Dan
>
> On Wed, May 3, 2017 at 6:30 AM, Prabeesh K. <[email protected]> wrote:
>
>> Hi Dan,
>>
>> Sorry for the late response.
>>
>> I agreed with you for the use cases that you mentioned.
>>
>> Advice me and please share if there is any sample code to join two data
>> sets in Beam that are sharing some common keys.
>>
>> Regards,
>> Prabeesh K.
>>
>> On 6 February 2017 at 10:38, Dan Halperin <[email protected]> wrote:
>>
>>> Definitely, using BigQuery for what BigQuery is really good at (big
>>> scans and cost-based joins) is nearly always a good idea. A strong
>>> endorsement of Ankur's answer.
>>>
>>> Pushing the right amount of work into a database is an art, however --
>>> there are some scenarios where you'd rather scan in BQ and join in Beam
>>> because the join result is very large and you can better filter it in Beam,
>>> or because you need to do some pre-join-filtering based on an external API
>>> call (and you don't want to load the results of that API call into
>>> BigQuery)...
>>>
>>> I've only seen a few, rare, cases of the latter.
>>>
>>> Thanks,
>>> Dan
>>>
>>> On Sun, Feb 5, 2017 at 9:19 PM, Prabeesh K. <[email protected]>
>>> wrote:
>>>
>>>> Hi Ankur,
>>>>
>>>> Thank you for your response.
>>>>
>>>> On 5 February 2017 at 23:59, Ankur Chauhan <[email protected]> wrote:
>>>>
>>>>> I have found doing joins in bigquery using sql is a lot faster and
>>>>> easier to iterate upon.
>>>>>
>>>>>
>>>>> Ankur Chauhan
>>>>> On Sat, Feb 4, 2017 at 22:05 Prabeesh K. <[email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Which is the better way to join two tables in apache beam?
>>>>>>
>>>>>> Regards,
>>>>>> Prabeesh K.
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to