huaxingao commented on pull request #29501:
URL: https://github.com/apache/spark/pull/29501#issuecomment-678885037


   I think we need to put the fix in 3.0, because in the case of data is 
already cached, this fix makes 3.0.0 behave the same as 2.4.
   In 2.4
   ```
   cache norm in memory
   ```
   
   currently in 3.0
   ```
   always cache zipped data (data and norm) regardless if original data is 
cached or not
   ```
   
   After this fix
   ```
   if (data is cached)
     cache norm in memory and disk
   else 
     cache zipped data (data and norm)
   ```
   
   The double caching in current 3.0 may cause performance degradation from 2.4 
to 3.0, so we want to put the fix in 3.0. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to