dongjoon-hyun edited a comment on pull request #30876:
URL: https://github.com/apache/spark/pull/30876#issuecomment-751577804


   @mridulm . Your comments are true and nothing wrong to me. I agree with you 
in every bits and we can disable this back for the problematic cases before 
Apache Spark 3.2.0 vote.
   
   To do that, I believe that we are able to agree that we need to identify 
what are the problematic corner cases in this threads.  At least, we need to 
provide a better document to the community about this option if some PMCs 
already have an implicit knowledge about the reasons why this option should be 
prohibited in YARN environment. It's an invaluable knowledge for the community 
to share. Besides, initially, I'm continuing this discussion because your 
initial concerns are crucial to the community. AFAIK, nobody else shared the 
concerns before explicitly.
   
   1. [Especially in context of dynamic resource allocation, it can become very 
chatty when executor's start getting 
dropped.](https://github.com/apache/spark/pull/30876#discussion_r547383304)
   2. [In the past, I found this to be noisy for the cases where replication 
was enabled.](https://github.com/apache/spark/pull/30876#discussion_r548191318)
   
   Could you elaborate about your concern more specifically?
   1. What is the negative side-effect of `very chatty` and `noisy`?
   2. How severe it was?
   
   Again, I'm not aiming to protect the default value of the configuration. 
It's just a configuration and the decision is up to us (you and me and all the 
community member). It's easy to disable this or to abandon this while it's 
difficult to improve this for Apache Spark. I'm trying to understand why this 
should be prohibited in some resource managers or in a normal Spark operation 
environment and trying to make the Apache Spark better for those cases. That's 
the reason why I tried to go deeper for that part by proposing the potential 
points and asked you similar questions in this thread specifically. We will 
have many choices in Apache Spark 3.2.0 if the implicit knowledge is shared 
more.
   
   1. [For the following, Apache Spark usually drop only empty executors. If 
you are saying a storage timeout configuration, I believe that what we need is 
to improve storage timeout configuration behavior after this enabling. I guess 
storage timeout had better not cause any chatty situation, of 
course.](https://github.com/apache/spark/pull/30876#discussion_r547421217)
   2. [I'm trying to understand the risk you mentioned for YARN environment. 
Could you give me more hints about your concerns on this at the YARN dynamic 
allocation situation? We can fix it the behavior and move forward if that's 
valid.](https://github.com/apache/spark/pull/30876#pullrequestreview-557444101)
   3. [This is my focused use case and I love to hear your concerns. You have 
all my ears.](https://github.com/apache/spark/pull/30876#issuecomment-750471287)
   
   So far, I didn't get your answers explicitly. Please let me know if I missed 
something there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to