[GitHub] spark pull request: [SPARK-9652][CORE] Added method for Avro file ...

liancheng Thu, 10 Sep 2015 11:44:26 -0700

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7971#discussion_r39197676
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -794,6 +797,45 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
       }
     
       /**
    +   * Reads in a directory of Avro files from HDFS, a local file system 
(available on all nodes), or
    +   * any Hadoop-supported file system URI. The records are read in as 
Generic Avro records. This
    +   * also allows a user to register a schema with Kryo, if they so choose 
to.
    +   *
    +   * You can do the following if you know the schema ahead of time:
    +   * {{{
    +   *   val schema = new Schema.Parser().parse(schemaString)
    +   *   sc.avroFile("/input-path", schema)
    +   * }}}
    +   *
    +   * or just:
    +   * {{{
    +   *   sc.avroFile("/input-path")
    +   * }}}
    +   */
    +  def avroFile(path: String, schemas: Schema*): RDD[GenericRecord] = {
    --- End diff --
    
    Not quite familiar with Avro, but why do we need to pass in more than one 
`Schema` instances here? Is this because all nested `Schema` classes must also 
be registered? If it is the case, would be nice to document it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-9652][CORE] Added method for Avro file ...

Reply via email to