I have wrote an spark streaming application reading kafka data and convert the json data to parquet and save to hdfs. What make me puzzled is, the processing time of app in yarn mode cost 20% to 50% more time than in local mode. My cluster have three nodes with three node managers, and all three hosts have same hardware, 40cores and 256GB memory. .
Why? How to solve it? Regard, Junfeng Chen