Let's say I use HashingTF in my Pipeline to hash a string feature.
This is available in Python and Scala, but they hash strings to
different values since both use their respective runtime's native hash
implementation. This means that I create different feature vectors for
the same input. While I can load/store something like a
NaiveBayesModel across the two languages successfully, it seems like
the hashing part doesn't translate.

Is that accurate, or, have I completely missed a way to get the same
hashing for the same input across languages?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to