mrk-andreev opened a new pull request, #47522:
URL: https://github.com/apache/spark/pull/47522
When we have mixed schema rows error message "{actual} is not a valid
external type for schema of {expected}" that don't help to understand column
with problem. I suggest to add information about source column.
## How to reproduce
```scala
class ErrorMsgSuite extends AnyFunSuite with SharedSparkContext {
test("shouldThrowSchemaError") {
val seq: Seq[Row] = Seq(
Row(
toBytes("0"),
toBytes(""),
1L,
),
Row(
toBytes("0"),
toBytes(""),
1L,
),
) val schema: StructType = new StructType()
.add("f1", BinaryType)
.add("f3", StringType)
.add("f2", LongType) val df =
sqlContext.createDataFrame(sqlContext.sparkContext.parallelize(seq), schema)
val exception = intercept[RuntimeException] {
df.show()
} assert(
exception.getCause.getMessage
.contains("[B is not a valid external type for schema of string")
)
assertResult(
"[B is not a valid external type for schema of string"
)(exception.getCause.getMessage)
} def toBytes(x: String): Array[Byte] = x.toCharArray.map(_.toByte)
}
```
After fix error message may contain extra info
```
[B is not a valid external type for schema of string at
getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1,
f3)
```
Example:
https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]