Hi, Sorry for taking so long to get back to you on this.
The reason for switching the switch (deprecating the original MapsideJoinStrategy, and adding the create() method which reverses sides) is because all other join strategies in Crunch are designed to have the smaller of the two PCollections given as the left side of the join. The original MapsideJoinStrategy was implemented with the right side being loaded into memory, which wasn't in line with the other implementations. In order to remain backwards compatible with existing code which relies on the right-side in memory behavior, but bring future code in line with always having the left-side table as the smaller of the two tables, the constructor was deprecated and the create method was added. The plan is to remove the public MapsideJoin constructor in a future version of Crunch. - Gabriel On Wed, May 11, 2016 at 3:42 AM, 陈竞 <[email protected]> wrote: > mapsideJoinStrategy.create() use LoadLeftSideMapsideJoinStrategy, i'm just > confused why LoadLeftSideMapsideJoinStrategy is better than default > strategy. > > according to the annotation, LoadLeftSideMapsideJoinStrategy peforms better > than default strategy, but i don't know why > > 2016-05-10 11:30 GMT+08:00 David Ortiz <[email protected]>: >> >> Try mapsideJoinStrategy.create() >> >> >> On Mon, May 9, 2016, 9:29 PM 陈竞 <[email protected]> wrote: >>> >>> hi, i'm very confused when i use MapsideJoinStrategy. the origin >>> constructor was deprecated, instead, LoadLeftSideMapsideJoinStrategy was >>> recommended, the main improvement is that load left side table in memory, >>> whose size is large than right side. however, when i want to use mas side >>> join, the left side table usually is too large to store in memory. >>> >>> for example i have to table A and B, we need A left join B, and >>> size(A)>>size(B), naturally we want to use map side join, and use A as left >>> side, B as right side, then load B in memory to process, it's very simple. >>> However, if we use LoadLeftSideMapsideJoinStrategy, we use A as right side, >>> B as left side, which makes no improvement while adding a reverse DoFn >>> >>> >>> >>> -- >>> 陈竞,中科院计算技术研究所,高性能计算机中心 >>> Jing Chen HPCC.ICT.AC China > > > > > -- > 陈竞,中科院计算技术研究所,高性能计算机中心 > Jing Chen HPCC.ICT.AC China
