huaxingao commented on pull request #28413:
URL: https://github.com/apache/spark/pull/28413#issuecomment-626116443


   Discussed with @WeichenXu123 offline. We agreed that for backwards 
compatibility, we only need to support loading the old model, that is, make 
HashingTF use the old MurmurHash3 when a model from prior Spark 3.0 is loaded, 
but we don't need to support saving the old model in the new release. However, 
we need to throw Exception if saving a prior 3.0 model in Spark 3.0.
   
   I will override save method, and throw Exception if old hashFunc is found. 
In the current code, I don't have a good way to differentiate the old hashFunc 
from the new one. I think I need the new code in this PR. I will do the 
following:
   ```
     @Since("3.0.0")
     override def save(path: String): Unit = {
       require(hashFuncVersion == HashingTF.SPARK_3_MURMUR3_HASH,
         s"The hash function needs to be SPARK_3_MURMUR3_HASH, but got 
SPARK_2_MURMUR3_HASH instead.")
       super.save(path)
     }
   ```
   Then in the end of test("SPARK-23469: Load HashingTF prior to Spark 3.0"), I 
will add these:
   ```
       intercept[IllegalArgumentException] {
         loadedHashingTF.save(hashingTFPath)
       }
   ```
   Does this approach look OK to you @WeichenXu123 @srowen?
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to