Enver Osmanov created SPARK-34435: ------------------------------------- Summary: ArrayIndexOutOfBoundsException when select in different case Key: SPARK-34435 URL: https://issues.apache.org/jira/browse/SPARK-34435 Project: Spark Issue Type: Bug Components: Optimizer, SQL Affects Versions: 3.0.1 Environment: Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException.
Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} Additional notes: Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} Reporter: Enver Osmanov -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org