Hi, all:
I have several questions about tez shuffle stage:
1) how to understand "pipelined shuffle"? Does it is becase the pipeline sort?
I find some comments about pipelined shuffle in
ShuffleSchaduler.copySucceeded(),but still cannot fully understand:
* In case of pipelined shuffle, it is quite possible that fetchers pulled
the FINAL_UPDATE spill in advance due to smaller output size. In such
scenarios, we need to wait until we retrieve all spill
* details to claim success.
Can you please explain the meaning more?
2) Are there any other shuffle mode besides pipelined shuffle? the legacy
mapreduce shuffle? (I know that tez borrows much of the MR shuffle.)
3) Where is the map output data stored? how to control its storage,Is there any
parameters for that?
4) If the map output stored in memory, how does custom vertex and tasks to
fetch them from memory? And if we do not re-use container,who manage map
outputs?
5) Does one fetcher corresponds with one mapoutput? And a fetcher just pull
one-time of all the data produced by one map output?
Any reply will be much appreciated.
Maria~.