Well, then apply md5 on all columns:

ds.select(ds.columns.map(col) ++ ds.columns.map(column => md5(col(column)).as(s"$column hash")): _*).show(false)

Enrico

Am 02.03.20 um 11:10 schrieb Chetan Khatri:
Thanks Enrico
I want to compute hash of all the columns value in the row.

On Fri, Feb 28, 2020 at 7:28 PM Enrico Minack <m...@enrico.minack.dev <mailto:m...@enrico.minack.dev>> wrote:

    This computes the md5 hash of a given column id of Dataset ds:

    ds.withColumn("id hash", md5($"id")).show(false)

    Test with this Dataset ds:

    import org.apache.spark.sql.types._
    val ds = spark.range(10).select($"id".cast(StringType))

    Available are md5, sha, sha1, sha2 and hash:
    https://spark.apache.org/docs/2.4.5/api/sql/index.html

    Enrico


    Am 28.02.20 um 13:56 schrieb Chetan Khatri:
    > Hi Spark Users,
    > How can I compute Hash of each row and store in new column at
    > Dataframe, could someone help me.
    >
    > Thanks



Reply via email to