[jira] [Commented] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case

2021-06-03 Thread Enver Osmanov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356550#comment-17356550
 ] 

Enver Osmanov commented on SPARK-34435:
---

I see it was fixed in [this PR|https://github.com/apache/spark/pull/31993].  
The PR leads to [the ticket|https://issues.apache.org/jira/browse/SPARK-34897] 
that says Spark 3.0.3, 3.1.2, 3.2.0  are good.

I have tested with Spark 3.1.2 and can confirm the issue is not reproducible.
Thank you.

> ArrayIndexOutOfBoundsException when select in different case
> 
>
> Key: SPARK-34435
> URL: https://issues.apache.org/jira/browse/SPARK-34435
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.1
>Reporter: Enver Osmanov
>Priority: Trivial
>
> h5. Actual behavior:
> Select column with different case after remapping fail with 
> ArrayIndexOutOfBoundsException.
> h5. Expected behavior:
> Spark shouldn't fail with ArrayIndexOutOfBoundsException.
>  Spark is case insensetive by default, so select should return selected 
> column.
> h5. Test case:
> {code:java}
> case class User(aA: String, bb: String)
> // ...
> val user = User("John", "Doe")
> val ds = Seq(user).toDS().map(identity)
> ds.select("aa").show(false)
> {code}
> h5. Additional notes:
> Test case is reproducible with Spark 3.0.1. There are no errors with Spark 
> 2.4.7.
> I belive problem could be solved by changing filter in 
> `SchemaPruning#pruneDataSchema` from this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
> {code}
> to this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => 
> dataSchemaFieldNames.contains(f.name.toLowerCase)))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case

2021-02-14 Thread Enver Osmanov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284489#comment-17284489
 ] 

Enver Osmanov commented on SPARK-34435:
---

[~ymajid] , it is absolutely ok with me. If you will have any questions, 
please, let me know.

> ArrayIndexOutOfBoundsException when select in different case
> 
>
> Key: SPARK-34435
> URL: https://issues.apache.org/jira/browse/SPARK-34435
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.1
>Reporter: Enver Osmanov
>Priority: Trivial
>
> h5. Actual behavior:
> Select column with different case after remapping fail with 
> ArrayIndexOutOfBoundsException.
> h5. Expected behavior:
> Spark shouldn't fail with ArrayIndexOutOfBoundsException.
>  Spark is case insensetive by default, so select should return selected 
> column.
> h5. Test case:
> {code:java}
> case class User(aA: String, bb: String)
> // ...
> val user = User("John", "Doe")
> val ds = Seq(user).toDS().map(identity)
> ds.select("aa").show(false)
> {code}
> h5. Additional notes:
> Test case is reproducible with Spark 3.0.1. There are no errors with Spark 
> 2.4.7.
> I belive problem could be solved by changing filter in 
> `SchemaPruning#pruneDataSchema` from this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
> {code}
> to this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => 
> dataSchemaFieldNames.contains(f.name.toLowerCase)))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case

2021-02-14 Thread Enver Osmanov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enver Osmanov updated SPARK-34435:
--
Description: 
h5. Actual behavior:

Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.
h5. Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
 Spark is case insensetive by default, so select should return selected column.
h5. Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
h5. Additional notes:

Test case is reproducible with Spark 3.0.1. There are no errors with Spark 
2.4.7.

I belive problem could be solved by changing filter in 
`SchemaPruning#pruneDataSchema` from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}

  was:
h5. Actual behavior:

Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.
h5. Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
 Spark is case insensetive by default, so select should return selected column.
h5. Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
h5. Additional notes:

Test case is reproducible with Spark 3.0.1. There are no errors with Spark 
2.4.7.

I belive problem could be solved by changing filter in pruneDataSchema method 
from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}


> ArrayIndexOutOfBoundsException when select in different case
> 
>
> Key: SPARK-34435
> URL: https://issues.apache.org/jira/browse/SPARK-34435
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.1
>Reporter: Enver Osmanov
>Priority: Trivial
>
> h5. Actual behavior:
> Select column with different case after remapping fail with 
> ArrayIndexOutOfBoundsException.
> h5. Expected behavior:
> Spark shouldn't fail with ArrayIndexOutOfBoundsException.
>  Spark is case insensetive by default, so select should return selected 
> column.
> h5. Test case:
> {code:java}
> case class User(aA: String, bb: String)
> // ...
> val user = User("John", "Doe")
> val ds = Seq(user).toDS().map(identity)
> ds.select("aa").show(false)
> {code}
> h5. Additional notes:
> Test case is reproducible with Spark 3.0.1. There are no errors with Spark 
> 2.4.7.
> I belive problem could be solved by changing filter in 
> `SchemaPruning#pruneDataSchema` from this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
> {code}
> to this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => 
> dataSchemaFieldNames.contains(f.name.toLowerCase)))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case

2021-02-14 Thread Enver Osmanov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enver Osmanov updated SPARK-34435:
--
Description: 
h5. Actual behavior:

Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.
h5. Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
 Spark is case insensetive by default, so select should return selected column.
h5. Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
h5. Additional notes:

Test case is reproducible with Spark 3.0.1. There is no errors with Spark 2.4.7.

I belive problem could be solved by changing filter in pruneDataSchema method 
from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}

  was:
h5. Actual behavior:
 Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.

h5. Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
 Spark is case insensetive by default, so select should return selected column.

h5. Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
h5. Additional notes:

Test case is reproducible with Spark 3.0.1. It works fine with Spark 2.4.7.

I belive problem could be solved by changing filter in pruneDataSchema method 
from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}


> ArrayIndexOutOfBoundsException when select in different case
> 
>
> Key: SPARK-34435
> URL: https://issues.apache.org/jira/browse/SPARK-34435
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.1
>Reporter: Enver Osmanov
>Priority: Trivial
>
> h5. Actual behavior:
> Select column with different case after remapping fail with 
> ArrayIndexOutOfBoundsException.
> h5. Expected behavior:
> Spark shouldn't fail with ArrayIndexOutOfBoundsException.
>  Spark is case insensetive by default, so select should return selected 
> column.
> h5. Test case:
> {code:java}
> case class User(aA: String, bb: String)
> // ...
> val user = User("John", "Doe")
> val ds = Seq(user).toDS().map(identity)
> ds.select("aa").show(false)
> {code}
> h5. Additional notes:
> Test case is reproducible with Spark 3.0.1. There is no errors with Spark 
> 2.4.7.
> I belive problem could be solved by changing filter in pruneDataSchema method 
> from SchemaPruning object from this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
> {code}
> to this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => 
> dataSchemaFieldNames.contains(f.name.toLowerCase)))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case

2021-02-14 Thread Enver Osmanov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enver Osmanov updated SPARK-34435:
--
Description: 
h5. Actual behavior:

Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.
h5. Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
 Spark is case insensetive by default, so select should return selected column.
h5. Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
h5. Additional notes:

Test case is reproducible with Spark 3.0.1. There are no errors with Spark 
2.4.7.

I belive problem could be solved by changing filter in pruneDataSchema method 
from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}

  was:
h5. Actual behavior:

Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.
h5. Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
 Spark is case insensetive by default, so select should return selected column.
h5. Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
h5. Additional notes:

Test case is reproducible with Spark 3.0.1. There is no errors with Spark 2.4.7.

I belive problem could be solved by changing filter in pruneDataSchema method 
from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}


> ArrayIndexOutOfBoundsException when select in different case
> 
>
> Key: SPARK-34435
> URL: https://issues.apache.org/jira/browse/SPARK-34435
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.1
>Reporter: Enver Osmanov
>Priority: Trivial
>
> h5. Actual behavior:
> Select column with different case after remapping fail with 
> ArrayIndexOutOfBoundsException.
> h5. Expected behavior:
> Spark shouldn't fail with ArrayIndexOutOfBoundsException.
>  Spark is case insensetive by default, so select should return selected 
> column.
> h5. Test case:
> {code:java}
> case class User(aA: String, bb: String)
> // ...
> val user = User("John", "Doe")
> val ds = Seq(user).toDS().map(identity)
> ds.select("aa").show(false)
> {code}
> h5. Additional notes:
> Test case is reproducible with Spark 3.0.1. There are no errors with Spark 
> 2.4.7.
> I belive problem could be solved by changing filter in pruneDataSchema method 
> from SchemaPruning object from this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
> {code}
> to this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => 
> dataSchemaFieldNames.contains(f.name.toLowerCase)))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case

2021-02-14 Thread Enver Osmanov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enver Osmanov updated SPARK-34435:
--
Description: 
h5. Actual behavior:
 Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.

h5. Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
 Spark is case insensetive by default, so select should return selected column.

h5. Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
h5. Additional notes:

Test case is reproducible with Spark 3.0.1. It works fine with Spark 2.4.7.

I belive problem could be solved by changing filter in pruneDataSchema method 
from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}

  was:
Actual behavior:
 Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.

Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
 Spark is case insensetive by default, so select should return selected column.

Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
Additional notes:

Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7.

I belive problem could be solved by changing filter in pruneDataSchema method 
from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}


> ArrayIndexOutOfBoundsException when select in different case
> 
>
> Key: SPARK-34435
> URL: https://issues.apache.org/jira/browse/SPARK-34435
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.1
>Reporter: Enver Osmanov
>Priority: Trivial
>
> h5. Actual behavior:
>  Select column with different case after remapping fail with 
> ArrayIndexOutOfBoundsException.
> h5. Expected behavior:
> Spark shouldn't fail with ArrayIndexOutOfBoundsException.
>  Spark is case insensetive by default, so select should return selected 
> column.
> h5. Test case:
> {code:java}
> case class User(aA: String, bb: String)
> // ...
> val user = User("John", "Doe")
> val ds = Seq(user).toDS().map(identity)
> ds.select("aa").show(false)
> {code}
> h5. Additional notes:
> Test case is reproducible with Spark 3.0.1. It works fine with Spark 2.4.7.
> I belive problem could be solved by changing filter in pruneDataSchema method 
> from SchemaPruning object from this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
> {code}
> to this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => 
> dataSchemaFieldNames.contains(f.name.toLowerCase)))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case

2021-02-14 Thread Enver Osmanov (Jira)
Enver Osmanov created SPARK-34435:
-

 Summary: ArrayIndexOutOfBoundsException when select in different 
case
 Key: SPARK-34435
 URL: https://issues.apache.org/jira/browse/SPARK-34435
 Project: Spark
  Issue Type: Bug
  Components: Optimizer, SQL
Affects Versions: 3.0.1
 Environment: Actual behavior:
Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.

Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
Spark is case insensetive by default, so select should return selected column.

Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
Additional notes:

Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7.

I belive problem could be solved by changing filter in pruneDataSchema method 
from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}
Reporter: Enver Osmanov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case

2021-02-14 Thread Enver Osmanov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enver Osmanov updated SPARK-34435:
--
Description: 
Actual behavior:
 Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.

Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
 Spark is case insensetive by default, so select should return selected column.

Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
Additional notes:

Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7.

I belive problem could be solved by changing filter in pruneDataSchema method 
from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code}

> ArrayIndexOutOfBoundsException when select in different case
> 
>
> Key: SPARK-34435
> URL: https://issues.apache.org/jira/browse/SPARK-34435
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.1
>Reporter: Enver Osmanov
>Priority: Trivial
>
> Actual behavior:
>  Select column with different case after remapping fail with 
> ArrayIndexOutOfBoundsException.
> Expected behavior:
> Spark shouldn't fail with ArrayIndexOutOfBoundsException.
>  Spark is case insensetive by default, so select should return selected 
> column.
> Test case:
> {code:java}
> case class User(aA: String, bb: String)
> // ...
> val user = User("John", "Doe")
> val ds = Seq(user).toDS().map(identity)
> ds.select("aa").show(false)
> {code}
> Additional notes:
> Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7.
> I belive problem could be solved by changing filter in pruneDataSchema method 
> from SchemaPruning object from this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
> {code}
> to this:
> {code:java}
> val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
> val mergedDataSchema =
>   StructType(mergedSchema.filter(f => 
> dataSchemaFieldNames.contains(f.name.toLowerCase)))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34435) ArrayIndexOutOfBoundsException when select in different case

2021-02-14 Thread Enver Osmanov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enver Osmanov updated SPARK-34435:
--
Environment: (was: Actual behavior:
Select column with different case after remapping fail with 
ArrayIndexOutOfBoundsException.

Expected behavior:

Spark shouldn't fail with ArrayIndexOutOfBoundsException.
Spark is case insensetive by default, so select should return selected column.

Test case:
{code:java}
case class User(aA: String, bb: String)
// ...
val user = User("John", "Doe")

val ds = Seq(user).toDS().map(identity)

ds.select("aa").show(false)
{code}
Additional notes:

Test case is reproduceble with Spark 3.0.1. It works fine with Spark 2.4.7.

I belive problem could be solved by changing filter in pruneDataSchema method 
from SchemaPruning object from this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name)))
{code}
to this:
{code:java}
val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet
val mergedDataSchema =
  StructType(mergedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name.toLowerCase)))
{code})

> ArrayIndexOutOfBoundsException when select in different case
> 
>
> Key: SPARK-34435
> URL: https://issues.apache.org/jira/browse/SPARK-34435
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.1
>Reporter: Enver Osmanov
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org