[ https://issues.apache.org/jira/browse/SPARK-39885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Vogelbacher updated SPARK-39885: -------------------------------------- Summary: Behavior differs between arrays_overlap and array_contains for negative 0.0 (was: Behavior differs between array_overlap and array_contains for negative 0.0) > Behavior differs between arrays_overlap and array_contains for negative 0.0 > --------------------------------------------------------------------------- > > Key: SPARK-39885 > URL: https://issues.apache.org/jira/browse/SPARK-39885 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.2 > Reporter: David Vogelbacher > Priority: Major > > {{array_contains([0.0], -0.0)}} will return true. {{array_overlaps([0.0], > [-0.0])}} will return false. I think we generally want to treat -0.0 and 0.0 > as the same (see > https://github.com/apache/spark/blob/e9eb28e27d10497c8b36774609823f4bbd2c8500/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SQLOrderingUtil.scala#L28) > However, the {{Double::equals}} method doesn't. Therefore, we should either > mark double as false in > [TypeUtils#typeWithProperEquals|https://github.com/apache/spark/blob/e9eb28e27d10497c8b36774609823f4bbd2c8500/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala#L96], > or we should wrap it with our own equals method that handles this case. > Java code snippets showing the issue: > {code:java} > dataset = sparkSession.createDataFrame( > List.of(RowFactory.create(List.of(-0.0))), > > DataTypes.createStructType(ImmutableList.of(DataTypes.createStructField( > "doubleCol", > DataTypes.createArrayType(DataTypes.DoubleType), false)))); > Dataset<Row> df = dataset.withColumn( > "overlaps", > functions.arrays_overlap(functions.array(functions.lit(+0.0)), > dataset.col("doubleCol"))); > List<Row> result = df.collectAsList(); // [[WrappedArray(-0.0),false]] > {code} > {code:java} > dataset = sparkSession.createDataFrame( > List.of(RowFactory.create(-0.0)), > DataTypes.createStructType( > > ImmutableList.of(DataTypes.createStructField("doubleCol", > DataTypes.DoubleType, false)))); > Dataset<Row> df = dataset.withColumn( > "contains", > functions.array_contains(functions.array(functions.lit(+0.0)), > dataset.col("doubleCol"))); > List<Row> result = df.collectAsList(); // [[-0.0,true]] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org