Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17492#discussion_r109307361
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
 ---
    @@ -202,41 +206,54 @@ private[sql] object JsonInferSchema {
     
       private def withCorruptField(
           struct: StructType,
    -      columnNameOfCorruptRecords: String): StructType = {
    -    if (!struct.fieldNames.contains(columnNameOfCorruptRecords)) {
    -      // If this given struct does not have a column used for corrupt 
records,
    -      // add this field.
    -      val newFields: Array[StructField] =
    -        StructField(columnNameOfCorruptRecords, StringType, nullable = 
true) +: struct.fields
    -      // Note: other code relies on this sorting for correctness, so don't 
remove it!
    -      java.util.Arrays.sort(newFields, structFieldComparator)
    -      StructType(newFields)
    -    } else {
    -      // Otherwise, just return this struct.
    +      other: DataType,
    +      columnNameOfCorruptRecords: String,
    +      parseMode: ParseMode) = parseMode match {
    +    case PermissiveMode =>
    +      // If we see any other data type at the root level, we get records 
that cannot be
    +      // parsed. So, we use the struct as the data type and add the 
corrupt field to the schema.
    +      if (!struct.fieldNames.contains(columnNameOfCorruptRecords)) {
    +        // If this given struct does not have a column used for corrupt 
records,
    +        // add this field.
    +        val newFields: Array[StructField] =
    +          StructField(columnNameOfCorruptRecords, StringType, nullable = 
true) +: struct.fields
    +        // Note: other code relies on this sorting for correctness, so 
don't remove it!
    +        java.util.Arrays.sort(newFields, structFieldComparator)
    +        StructType(newFields)
    +      } else {
    +        // Otherwise, just return this struct.
    +        struct
    +      }
    +
    +    case DropMalformedMode =>
    +      // If corrupt record handling is disabled we retain the valid schema 
and discard the other.
           struct
    -    }
    +
    +    case FailFastMode =>
    --- End diff --
    
    It looks possible to run this line. I added a test in 1936L. In more 
details, if the json is a valid one but not a object of an areay of object, it 
will infer not `StructType` per record. If one of the types is not a struct 
type and failfast mode is enabled, we will hit this line.
    
    I am outside now. I will double check when I get to my computer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to