[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

gatorsmile Thu, 11 Oct 2018 16:07:04 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22379#discussion_r224629618
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVUtils.scala ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.csv
    +
    +object CSVUtils {
    +  /**
    +   * Filter ignorable rows for CSV iterator (lines empty and starting with 
`comment`).
    +   * This is currently being used in CSV reading path and CSV schema 
inference.
    +   */
    +  def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
    +    iter.filter { line =>
    +      line.trim.nonEmpty && !line.startsWith(options.comment.toString)
    +    }
    +  }
    +
    +  /**
    +   * Helper method that converts string representation of a character to 
actual character.
    +   * It handles some Java escaped strings and throws exception if given 
string is longer than one
    +   * character.
    +   */
    +  @throws[IllegalArgumentException]
    +  def toChar(str: String): Char = {
    --- End diff --
    
    Do we need to keep both versions? Can we just let the functions in 
`org.apache.spark.sql.execution.datasources.csv` call the func here?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

Reply via email to