Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7971#discussion_r39808267
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -794,6 +797,45 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
       }
     
       /**
    +   * Reads in a directory of Avro files from HDFS, a local file system 
(available on all nodes), or
    +   * any Hadoop-supported file system URI. The records are read in as 
Generic Avro records. This
    +   * also allows a user to register a schema with Kryo, if they so choose 
to.
    +   *
    +   * You can do the following if you know the schema ahead of time:
    +   * {{{
    +   *   val schema = new Schema.Parser().parse(schemaString)
    +   *   sc.avroFile("/input-path", schema)
    +   * }}}
    +   *
    +   * or just:
    +   * {{{
    +   *   sc.avroFile("/input-path")
    +   * }}}
    +   */
    +  def avroFile(path: String, schemas: Schema*): RDD[GenericRecord] = {
    --- End diff --
    
    Hm, so this API allows to read Avro records that have completely unrelated 
schemas as `GenericRecord`s?  Does this mean users may get Avro 
`GenericRecord`s in different shape?  This seems to be a little bit unintuitive 
to me.  Is this a common case when dealing with Avro files?  Especially, is 
this related to schema merging?  Maybe we should restrict this PR to only 
support reading a single type of Avro records at a time (namely, remove the 
`*`).  We can always provide more powerful APIs later if they are proved to be 
practical.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to