VitoMakarevich commented on PR #42893:
URL: https://github.com/apache/spark/pull/42893#issuecomment-1718041681

   My reasoning was that currently Spark has `System.gc()` so it should not be 
a problem to add another call simular to it.
   As for comments:
   1. `Impact on performance` - Indeed in some circumstances, it may be a 
problem, but it's better than failing with OOM since millions of objects are 
accumulated. Also, it's done by a single thread, so the blast radius is 
limited. And yet it may be added under a feature flag so users who struggle 
with simular problem may enable it(probably it's minority of users, but they 
may need a fix as well).
   2. `Impact on memory usage` - Yet again you are saying that usually standard 
approach should work and I'm sure it works, but there are cases when it's not 
enough. I assume we can say the same about `System.gc()` - usually you should 
not invoke it since it's costly, but it's here.
   3. `Possible change in expected behavior` - there is no way to define logic 
depending on it as far as I know, it's JVM internal, and it's different from 
`ReferenceQueue` - a mechanism designed for controllable release of resources - 
and that's why `finalization` is evil.
   
   As for `try finally` - for sure this is the best way to do it, but as I said 
- the problem comes from the side library(as I understood from `netty`) and 
java implementation - so you can't apply this pattern there. And the second 
problem is that even if you surround a block with `try finally` if it uses 
objects with non-empty `finalize` - they will go through `Finalizer` thread 
anyway, so there is no way to prevent it in Java 8.
   As for Spark update - you guys see better, maybe it's not a problem in the 
long-term, but as usual with Java - many users will live years on Java 8 as 
living now.
   
   As for GC settings - it may look strange, but I didn't find `any` mechanism 
to control the finalization queue in `any` GC. If you know such - for sure it's 
up to users to change it according to their application. Another argument for 
adding this call is that while the user can tune GC - there is still a 
`System.gc()` in the Spark codebase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to