Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14784417 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -130,6 +131,47 @@ class SQLContext(@transient val sparkContext: SparkContext) new SchemaRDD(this, JsonRDD.inferSchema(json, samplingRatio)) /** + * Loads a CSV file (according to RFC 4180) and returns the result as a [[SchemaRDD]]. + * + * NOTE: If there are new line characters inside quoted fields this method may fail to + * parse correctly, because the two lines may be in different partitions. Use + * [[SQLContext#csvRDD]] to parse such files. + * + * @param path path to input file + * @param delimiter Optional delimiter (default is comma) + * @param quote Optional quote character or string (default is '"') + * @param header Optional flag to indicate first line of each file is the header + * (default is false) + */ + def csvFile(path: String, + delimiter: String = ",", + quote: String = "\"", --- End diff -- I just double checked. Pandas accepts a single character but R accepts any length string as quote. I have seen CSVs with double single quote characters, and I think supporting the general case is helpful.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---