Github user ConeyLiu commented on the issue:
https://github.com/apache/spark/pull/19135
Firstly, Serialization time did not take a long time. You can see follow:
<img width="848" alt="untitled"
src="https://user-images.githubusercontent.com/12733256/30067330-1596eb1e-928d-11e7-818a-4a292e601a26.png">
Secondly, I do not think that every executor in a distributed system should
be set to very little core and memory. Because the more the process also means
that more communication between the process, which means more data
serialization and deserialization.
Thirdly, only when there are enough concurrent threads, thread
synchronization will cause performance problems. In the server, we have 70 to
80 cores, concurrent tasks more than this.
This change is really small, the proportion of the entire task is also very
small, so the impact on the total time is not so big, but in this test case,
still increased by 5%.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]