[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12904 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...
Github user sureshthalamati commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r84373258 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -90,6 +90,7 @@ private[csv] class CSVOptions(@transient private val parameters: Map[String, Str val permissive = ParseModes.isPermissiveMode(parseMode) val nullValue = parameters.getOrElse("nullValue", "") + val emptyValue = parameters.getOrElse("emptyValue", "") --- End diff -- Yes, null and empty can not be differentiated when they are set to same value. Currently null value check has higher precedence than the empty value. input.csv 1, 2,ââ Output will be: 1, null 2, null I think this behavior is ok. By default Univocity CSV parser used in spark also returns null for empty strings. I agree we should document this behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r84237920 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -90,6 +90,7 @@ private[csv] class CSVOptions(@transient private val parameters: Map[String, Str val permissive = ParseModes.isPermissiveMode(parseMode) val nullValue = parameters.getOrElse("nullValue", "") + val emptyValue = parameters.getOrElse("emptyValue", "") --- End diff -- +1 for setting a explicit precedence for both. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r84235107 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -90,6 +90,7 @@ private[csv] class CSVOptions(@transient private val parameters: Map[String, Str val permissive = ParseModes.isPermissiveMode(parseMode) val nullValue = parameters.getOrElse("nullValue", "") + val emptyValue = parameters.getOrElse("emptyValue", "") --- End diff -- When `nullValue` and `emptyValue` are both `""` in default, don't they conflict? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r83341979 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala --- @@ -46,6 +46,7 @@ private[sql] abstract class CsvReader(params: CSVOptions, headers: Seq[String]) settings.setInputBufferSize(params.inputBufferSize) settings.setMaxColumns(params.maxColumns) settings.setNullValue(params.nullValue) +settings.setEmptyValue("") --- End diff -- Hard coding this is not a good idea. Please add a new option in `CSVOption` and pass to the parser. The default value could be `""`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r83342029 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -555,4 +558,37 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { assert(numbers.count() == 8) } + + test("load data with empty quoted string fields.") { --- End diff -- Would you also add a regression unit-test to make sure this patch also fixes https://issues.apache.org/jira/browse/SPARK-17916? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org