[jira] [Commented] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case
[ https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356550#comment-17356550 ] Enver Osmanov commented on SPARK-34435: --- I see it was fixed in [this PR|https://github.com/apache/spark/pull/31993]. The PR leads to [the ticket|https://issues.apache.org/jira/browse/SPARK-34897] that says Spark 3.0.3, 3.1.2, 3.2.0 are good. I have tested with Spark 3.1.2 and can confirm the issue is not reproducible. Thank you. > ArrayIndexOutOfBoundsException when select in different case > > > Key: SPARK-34435 > URL: https://issues.apache.org/jira/browse/SPARK-34435 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.1 >Reporter: Enver Osmanov >Priority: Trivial > > h5. Actual behavior: > Select column with different case after remapping fail with > ArrayIndexOutOfBoundsException. > h5. Expected behavior: > Spark shouldn't fail with ArrayIndexOutOfBoundsException. > Spark is case insensetive by default, so select should return selected > column. > h5. Test case: > {code:java} > case class User(aA: String, bb: String) > // ... > val user = User("John", "Doe") > val ds = Seq(user).toDS().map(identity) > ds.select("aa").show(false) > {code} > h5. Additional notes: > Test case is reproducible with Spark 3.0.1. There are no errors with Spark > 2.4.7. > I belive problem could be solved by changing filter in > `SchemaPruning#pruneDataSchema` from this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) > {code} > to this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => > dataSchemaFieldNames.contains(f.name.toLowerCase))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case
[ https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284489#comment-17284489 ] Enver Osmanov commented on SPARK-34435: --- [~ymajid] , it is absolutely ok with me. If you will have any questions, please, let me know. > ArrayIndexOutOfBoundsException when select in different case > > > Key: SPARK-34435 > URL: https://issues.apache.org/jira/browse/SPARK-34435 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.1 >Reporter: Enver Osmanov >Priority: Trivial > > h5. Actual behavior: > Select column with different case after remapping fail with > ArrayIndexOutOfBoundsException. > h5. Expected behavior: > Spark shouldn't fail with ArrayIndexOutOfBoundsException. > Spark is case insensetive by default, so select should return selected > column. > h5. Test case: > {code:java} > case class User(aA: String, bb: String) > // ... > val user = User("John", "Doe") > val ds = Seq(user).toDS().map(identity) > ds.select("aa").show(false) > {code} > h5. Additional notes: > Test case is reproducible with Spark 3.0.1. There are no errors with Spark > 2.4.7. > I belive problem could be solved by changing filter in > `SchemaPruning#pruneDataSchema` from this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) > {code} > to this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => > dataSchemaFieldNames.contains(f.name.toLowerCase))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case
[ https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enver Osmanov updated SPARK-34435: -- Description: h5. Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. h5. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. h5. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} h5. Additional notes: Test case is reproducible with Spark 3.0.1. There are no errors with Spark 2.4.7. I belive problem could be solved by changing filter in `SchemaPruning#pruneDataSchema` from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} was: h5. Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. h5. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. h5. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} h5. Additional notes: Test case is reproducible with Spark 3.0.1. There are no errors with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} > ArrayIndexOutOfBoundsException when select in different case > > > Key: SPARK-34435 > URL: https://issues.apache.org/jira/browse/SPARK-34435 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.1 >Reporter: Enver Osmanov >Priority: Trivial > > h5. Actual behavior: > Select column with different case after remapping fail with > ArrayIndexOutOfBoundsException. > h5. Expected behavior: > Spark shouldn't fail with ArrayIndexOutOfBoundsException. > Spark is case insensetive by default, so select should return selected > column. > h5. Test case: > {code:java} > case class User(aA: String, bb: String) > // ... > val user = User("John", "Doe") > val ds = Seq(user).toDS().map(identity) > ds.select("aa").show(false) > {code} > h5. Additional notes: > Test case is reproducible with Spark 3.0.1. There are no errors with Spark > 2.4.7. > I belive problem could be solved by changing filter in > `SchemaPruning#pruneDataSchema` from this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) > {code} > to this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => > dataSchemaFieldNames.contains(f.name.toLowerCase))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case
[ https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enver Osmanov updated SPARK-34435: -- Description: h5. Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. h5. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. h5. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} h5. Additional notes: Test case is reproducible with Spark 3.0.1. There is no errors with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} was: h5. Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. h5. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. h5. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} h5. Additional notes: Test case is reproducible with Spark 3.0.1. It works fine with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} > ArrayIndexOutOfBoundsException when select in different case > > > Key: SPARK-34435 > URL: https://issues.apache.org/jira/browse/SPARK-34435 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.1 >Reporter: Enver Osmanov >Priority: Trivial > > h5. Actual behavior: > Select column with different case after remapping fail with > ArrayIndexOutOfBoundsException. > h5. Expected behavior: > Spark shouldn't fail with ArrayIndexOutOfBoundsException. > Spark is case insensetive by default, so select should return selected > column. > h5. Test case: > {code:java} > case class User(aA: String, bb: String) > // ... > val user = User("John", "Doe") > val ds = Seq(user).toDS().map(identity) > ds.select("aa").show(false) > {code} > h5. Additional notes: > Test case is reproducible with Spark 3.0.1. There is no errors with Spark > 2.4.7. > I belive problem could be solved by changing filter in pruneDataSchema method > from SchemaPruning object from this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) > {code} > to this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => > dataSchemaFieldNames.contains(f.name.toLowerCase))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case
[ https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enver Osmanov updated SPARK-34435: -- Description: h5. Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. h5. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. h5. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} h5. Additional notes: Test case is reproducible with Spark 3.0.1. There are no errors with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} was: h5. Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. h5. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. h5. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} h5. Additional notes: Test case is reproducible with Spark 3.0.1. There is no errors with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} > ArrayIndexOutOfBoundsException when select in different case > > > Key: SPARK-34435 > URL: https://issues.apache.org/jira/browse/SPARK-34435 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.1 >Reporter: Enver Osmanov >Priority: Trivial > > h5. Actual behavior: > Select column with different case after remapping fail with > ArrayIndexOutOfBoundsException. > h5. Expected behavior: > Spark shouldn't fail with ArrayIndexOutOfBoundsException. > Spark is case insensetive by default, so select should return selected > column. > h5. Test case: > {code:java} > case class User(aA: String, bb: String) > // ... > val user = User("John", "Doe") > val ds = Seq(user).toDS().map(identity) > ds.select("aa").show(false) > {code} > h5. Additional notes: > Test case is reproducible with Spark 3.0.1. There are no errors with Spark > 2.4.7. > I belive problem could be solved by changing filter in pruneDataSchema method > from SchemaPruning object from this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) > {code} > to this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => > dataSchemaFieldNames.contains(f.name.toLowerCase))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case
[ https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enver Osmanov updated SPARK-34435: -- Description: h5. Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. h5. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. h5. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} h5. Additional notes: Test case is reproducible with Spark 3.0.1. It works fine with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} was: Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} Additional notes: Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} > ArrayIndexOutOfBoundsException when select in different case > > > Key: SPARK-34435 > URL: https://issues.apache.org/jira/browse/SPARK-34435 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.1 >Reporter: Enver Osmanov >Priority: Trivial > > h5. Actual behavior: > Select column with different case after remapping fail with > ArrayIndexOutOfBoundsException. > h5. Expected behavior: > Spark shouldn't fail with ArrayIndexOutOfBoundsException. > Spark is case insensetive by default, so select should return selected > column. > h5. Test case: > {code:java} > case class User(aA: String, bb: String) > // ... > val user = User("John", "Doe") > val ds = Seq(user).toDS().map(identity) > ds.select("aa").show(false) > {code} > h5. Additional notes: > Test case is reproducible with Spark 3.0.1. It works fine with Spark 2.4.7. > I belive problem could be solved by changing filter in pruneDataSchema method > from SchemaPruning object from this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) > {code} > to this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => > dataSchemaFieldNames.contains(f.name.toLowerCase))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case
Enver Osmanov created SPARK-34435: - Summary: ArrayIndexOutOfBoundsException when select in different case Key: SPARK-34435 URL: https://issues.apache.org/jira/browse/SPARK-34435 Project: Spark Issue Type: Bug Components: Optimizer, SQL Affects Versions: 3.0.1 Environment: Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} Additional notes: Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} Reporter: Enver Osmanov -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case
[ https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enver Osmanov updated SPARK-34435: -- Description: Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} Additional notes: Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code} > ArrayIndexOutOfBoundsException when select in different case > > > Key: SPARK-34435 > URL: https://issues.apache.org/jira/browse/SPARK-34435 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.1 >Reporter: Enver Osmanov >Priority: Trivial > > Actual behavior: > Select column with different case after remapping fail with > ArrayIndexOutOfBoundsException. > Expected behavior: > Spark shouldn't fail with ArrayIndexOutOfBoundsException. > Spark is case insensetive by default, so select should return selected > column. > Test case: > {code:java} > case class User(aA: String, bb: String) > // ... > val user = User("John", "Doe") > val ds = Seq(user).toDS().map(identity) > ds.select("aa").show(false) > {code} > Additional notes: > Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7. > I belive problem could be solved by changing filter in pruneDataSchema method > from SchemaPruning object from this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) > {code} > to this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => > dataSchemaFieldNames.contains(f.name.toLowerCase))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case
[ https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enver Osmanov updated SPARK-34435: -- Environment: (was: Actual behavior: Select column with different case after remapping fail with ArrayIndexOutOfBoundsException. Expected behavior: Spark shouldn't fail with ArrayIndexOutOfBoundsException. Spark is case insensetive by default, so select should return selected column. Test case: {code:java} case class User(aA: String, bb: String) // ... val user = User("John", "Doe") val ds = Seq(user).toDS().map(identity) ds.select("aa").show(false) {code} Additional notes: Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7. I belive problem could be solved by changing filter in pruneDataSchema method from SchemaPruning object from this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) {code} to this: {code:java} val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet val mergedDataSchema = StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name.toLowerCase))) {code}) > ArrayIndexOutOfBoundsException when select in different case > > > Key: SPARK-34435 > URL: https://issues.apache.org/jira/browse/SPARK-34435 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.1 >Reporter: Enver Osmanov >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org