Hi Prabeesh, The underlying Beam primitive you use for Join is CoGroupByKey – this takes N different collections KV<K, V1> , KV<K, V2> , ... K<K, VN> and produces one collection KV<K, [Iterable<V1>, Iterable<V2>, ..., Iterable<VN>]>. This is a compressed representation of a Join result, in that you can expand it to a full outer join, you can implement inner join, and you can implement lots of other join algorithms.
There is also a Join library that does this under the hood: https://github.com/apache/beam/tree/master/sdks/java/extensions/join-library Dan On Wed, May 3, 2017 at 6:30 AM, Prabeesh K. <[email protected]> wrote: > Hi Dan, > > Sorry for the late response. > > I agreed with you for the use cases that you mentioned. > > Advice me and please share if there is any sample code to join two data > sets in Beam that are sharing some common keys. > > Regards, > Prabeesh K. > > On 6 February 2017 at 10:38, Dan Halperin <[email protected]> wrote: > >> Definitely, using BigQuery for what BigQuery is really good at (big scans >> and cost-based joins) is nearly always a good idea. A strong endorsement of >> Ankur's answer. >> >> Pushing the right amount of work into a database is an art, however -- >> there are some scenarios where you'd rather scan in BQ and join in Beam >> because the join result is very large and you can better filter it in Beam, >> or because you need to do some pre-join-filtering based on an external API >> call (and you don't want to load the results of that API call into >> BigQuery)... >> >> I've only seen a few, rare, cases of the latter. >> >> Thanks, >> Dan >> >> On Sun, Feb 5, 2017 at 9:19 PM, Prabeesh K. <[email protected]> wrote: >> >>> Hi Ankur, >>> >>> Thank you for your response. >>> >>> On 5 February 2017 at 23:59, Ankur Chauhan <[email protected]> wrote: >>> >>>> I have found doing joins in bigquery using sql is a lot faster and >>>> easier to iterate upon. >>>> >>>> >>>> Ankur Chauhan >>>> On Sat, Feb 4, 2017 at 22:05 Prabeesh K. <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> Which is the better way to join two tables in apache beam? >>>>> >>>>> Regards, >>>>> Prabeesh K. >>>>> >>>> >>> >> >
