huaxingao commented on pull request #28413:
URL: https://github.com/apache/spark/pull/28413#issuecomment-626116443
Discussed with @WeichenXu123 offline. We agreed that for backwards
compatibility, we only need to support loading the old model, that is, make
HashingTF use the old MurmurHash3 when a model from prior Spark 3.0 is loaded,
but we don't need to support saving the old model in the new release. However,
we need to throw Exception if saving a prior 3.0 model in Spark 3.0.
I will override save method, and throw Exception if old hashFunc is found.
In the current code, I don't have a good way to differentiate the old hashFunc
from the new one. I think I need the new code in this PR. I will do the
following:
```
@Since("3.0.0")
override def save(path: String): Unit = {
require(hashFuncVersion == HashingTF.SPARK_3_MURMUR3_HASH,
s"The hash function needs to be SPARK_3_MURMUR3_HASH, but got
SPARK_2_MURMUR3_HASH instead.")
super.save(path)
}
```
Then in the end of test("SPARK-23469: Load HashingTF prior to Spark 3.0"), I
will add these:
```
intercept[IllegalArgumentException] {
loadedHashingTF.save(hashingTFPath)
}
```
Does this approach look OK to you @WeichenXu123 @srowen?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]