Scalding's `join` methods (join, joinLeft, etc) are the way to join 2 large tables. It's implemented as a standard map/reduce shuffle join, and scales horizontally, though it does require sending the full dataset across the network from the mappers to the reducers.
If you have skew in your keyspace (some keys appear far more often than others) you can use a skew join, which has special handling for frequently appearing keys. You can tell if you have skew in your keyspace from your hadoop counters and from the symptom of a small number of your (many) reducers taking much much longer than the others. On Thu, May 23, 2019 at 3:40 PM Saket Kumar <[email protected]> wrote: > Thanks for replying to this. Is there any other technique in scalding to > join two large tables? > > > > On Thursday, May 23, 2019 at 3:15:12 PM UTC-7, Alex Levenson wrote: >> >> Yes, we don't have that feature in scalding unfortunately. >> >> >> On Thu, May 23, 2019 at 3:11 PM Rajat Ahuja <[email protected]> wrote: >> >>> @Alex It is efficient if data sets are already partitioned so that we do >>> not pass it through reducers to partition it. >>> @Saket Scalding Library does not support sorted bucketed join as of >>> now. >>> >>> Thanks >>> Rajat Ahuja >>> >>> On Fri, May 24, 2019 at 3:28 AM 'Alex Levenson' via Scalding Development >>> <[email protected]> wrote: >>> >>>> I'm not very familiar with that. I did some googling, it looks like >>>> that's for merging two already sorted datasets, is that right? >>>> >>>> On Thu, May 23, 2019 at 2:34 PM Saket Kumar <[email protected]> >>>> wrote: >>>> >>>>> There is a feature in Hive to do Sorted Merge Bucket join. How can >>>>> this be implemented in Scalding? >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Scalding Development" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/scalding-dev/fc7f4c54-651c-4ef8-aae1-5798c206b9fa%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/scalding-dev/fc7f4c54-651c-4ef8-aae1-5798c206b9fa%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> -- >>>> Alex Levenson >>>> @THISWILLWORK >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Scalding Development" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/scalding-dev/CA%2Bkkn9-Kbcfr31%2B%2BLXQnetBJyQD6oi17BUShuLW71_S0OOXxjA%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/scalding-dev/CA%2Bkkn9-Kbcfr31%2B%2BLXQnetBJyQD6oi17BUShuLW71_S0OOXxjA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >> >> -- >> Alex Levenson >> @THISWILLWORK >> > -- > You received this message because you are subscribed to the Google Groups > "Scalding Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/scalding-dev/17ce3eb7-b90e-4c7e-9442-5fd3d6088e55%40googlegroups.com > <https://groups.google.com/d/msgid/scalding-dev/17ce3eb7-b90e-4c7e-9442-5fd3d6088e55%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Alex Levenson @THISWILLWORK -- You received this message because you are subscribed to the Google Groups "Scalding Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/scalding-dev/CA%2Bkkn98RkrTTZtajDGOTTV8mREM_ijcCaNet_Enm1FvSm4kF-A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
