There's a case that the partitioned sender elapsed lots of time to wait . >From the profile ,we saw that the sender waits for 1 hour ,the opposite recivier and its subsequent hash aggregate operator spends 1 hour to its process time. (39 sender minor fragments,7 reciver minor fragments, each sender sends about 8m data)Through reading the codes,some opinions and wonder please correct me.
1. what's design purpose by setting the DataTunnel holds 3 semaphores? to throttle the sender part ? 2. The profile stats of sender's wait time does not include the reciver's subsequent operators' process time . right? 3. Is there any advice to accelerate the sender ,receiver phase? 4. The muxechange's purpose is to merge minor fragments belonging to the same machines to save the sender buffer memory. right?