The doc for DataFrameReader#json(RDD[String]) method says "Unless the schema is specified using schema function, this function goes through the input once to determine the input schema."
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader Why is this necessary? Why can't it create the dataframe at the same time as it's determining the schema? Thanks.