[GitHub] spark pull request #21296: [SPARK-24244][SQL] CSV column pruning

HyukjinKwon Thu, 10 May 2018 18:15:29 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21296#discussion_r187499921
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
    @@ -73,11 +64,24 @@ class UnivocityParser(
       // Each input token is placed in each output row's position by mapping 
these. In this case,
       //
       //   output row - ["A", 2]
    -  private val valueConverters: Array[ValueConverter] =
    -    schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
    +  private val valueConverters: Array[ValueConverter] = {
    +    requiredSchema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
    +  }
     
    -  private val tokenIndexArr: Array[Int] = {
    -    requiredSchema.map(f => schema.indexOf(f)).toArray
    +  private val tokenizer = {
    +    val parserSetting = options.asParserSettings
    +    if (requiredSchema.length < schema.length) {
    +      val tokenIndexArr = requiredSchema.map(f => 
java.lang.Integer.valueOf(schema.indexOf(f)))
    +      parserSetting.selectIndexes(tokenIndexArr: _*)
    --- End diff --
    
    I think I tried this locally but I didn't submit a PR since the improvement 
was trivial and a test was broken fwiw.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21296: [SPARK-24244][SQL] CSV column pruning

Reply via email to