Hi Dan Thank you for your prompt reply.
Regards, Prabeesh K. On 3 May 2017 at 19:23, Dan Halperin <[email protected]> wrote: > Hi Prabeesh, > > The underlying Beam primitive you use for Join is CoGroupByKey – this > takes N different collections KV<K, V1> , KV<K, V2> , ... K<K, VN> and > produces one collection KV<K, [Iterable<V1>, Iterable<V2>, ..., > Iterable<VN>]>. This is a compressed representation of a Join result, in > that you can expand it to a full outer join, you can implement inner join, > and you can implement lots of other join algorithms. > > There is also a Join library that does this under the hood: > https://github.com/apache/beam/tree/master/sdks/ > java/extensions/join-library > > Dan > > On Wed, May 3, 2017 at 6:30 AM, Prabeesh K. <[email protected]> wrote: > >> Hi Dan, >> >> Sorry for the late response. >> >> I agreed with you for the use cases that you mentioned. >> >> Advice me and please share if there is any sample code to join two data >> sets in Beam that are sharing some common keys. >> >> Regards, >> Prabeesh K. >> >> On 6 February 2017 at 10:38, Dan Halperin <[email protected]> wrote: >> >>> Definitely, using BigQuery for what BigQuery is really good at (big >>> scans and cost-based joins) is nearly always a good idea. A strong >>> endorsement of Ankur's answer. >>> >>> Pushing the right amount of work into a database is an art, however -- >>> there are some scenarios where you'd rather scan in BQ and join in Beam >>> because the join result is very large and you can better filter it in Beam, >>> or because you need to do some pre-join-filtering based on an external API >>> call (and you don't want to load the results of that API call into >>> BigQuery)... >>> >>> I've only seen a few, rare, cases of the latter. >>> >>> Thanks, >>> Dan >>> >>> On Sun, Feb 5, 2017 at 9:19 PM, Prabeesh K. <[email protected]> >>> wrote: >>> >>>> Hi Ankur, >>>> >>>> Thank you for your response. >>>> >>>> On 5 February 2017 at 23:59, Ankur Chauhan <[email protected]> wrote: >>>> >>>>> I have found doing joins in bigquery using sql is a lot faster and >>>>> easier to iterate upon. >>>>> >>>>> >>>>> Ankur Chauhan >>>>> On Sat, Feb 4, 2017 at 22:05 Prabeesh K. <[email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Which is the better way to join two tables in apache beam? >>>>>> >>>>>> Regards, >>>>>> Prabeesh K. >>>>>> >>>>> >>>> >>> >> >
