Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21376#discussion_r189457193
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
 ---
    @@ -66,9 +67,13 @@ private[sql] object JsonInferSchema {
                     s"Parse Mode: ${FailFastMode.name}.", e)
               }
             }
    -      }
    -    }.fold(StructType(Nil))(
    -      compatibleRootType(columnNameOfCorruptRecord, parseMode))
    +      }.reduceOption(typeMerger).toIterator
    +    }
    +
    +    // Here we get RDD local iterator then fold, instead of calling 
`RDD.fold` directly, because
    +    // `RDD.fold` will run the fold function in DAGScheduler event loop 
thread, which may not have
    +    // active SparkSession and `SQLConf.get` may point to the wrong 
configs.
    +    val rootType = 
mergedTypesFromPartitions.toLocalIterator.fold(StructType(Nil))(typeMerger)
    --- End diff --
    
    I re-run the `JsonBenmark` and no performance regression is observed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to