[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12904


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

2016-10-20 Thread sureshthalamati
Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/12904#discussion_r84373258
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -90,6 +90,7 @@ private[csv] class CSVOptions(@transient private val 
parameters: Map[String, Str
   val permissive = ParseModes.isPermissiveMode(parseMode)
 
   val nullValue = parameters.getOrElse("nullValue", "")
+  val emptyValue = parameters.getOrElse("emptyValue", "")
--- End diff --

Yes, null and empty can not be differentiated when they are set to same 
value.  Currently null value check has higher precedence than the empty value. 

input.csv
1,
2,”” 

Output will be: 
1, null
2, null


I think this behavior  is ok.  By default  Univocity CSV parser  used in 
spark  also returns  null for empty strings.

I agree we should document this behavior. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

2016-10-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12904#discussion_r84237920
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -90,6 +90,7 @@ private[csv] class CSVOptions(@transient private val 
parameters: Map[String, Str
   val permissive = ParseModes.isPermissiveMode(parseMode)
 
   val nullValue = parameters.getOrElse("nullValue", "")
+  val emptyValue = parameters.getOrElse("emptyValue", "")
--- End diff --

+1 for setting a explicit precedence for both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

2016-10-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/12904#discussion_r84235107
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -90,6 +90,7 @@ private[csv] class CSVOptions(@transient private val 
parameters: Map[String, Str
   val permissive = ParseModes.isPermissiveMode(parseMode)
 
   val nullValue = parameters.getOrElse("nullValue", "")
+  val emptyValue = parameters.getOrElse("emptyValue", "")
--- End diff --

When `nullValue` and `emptyValue` are both `""` in default, don't they 
conflict?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

2016-10-14 Thread falaki
Github user falaki commented on a diff in the pull request:

https://github.com/apache/spark/pull/12904#discussion_r83341979
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala
 ---
@@ -46,6 +46,7 @@ private[sql] abstract class CsvReader(params: CSVOptions, 
headers: Seq[String])
 settings.setInputBufferSize(params.inputBufferSize)
 settings.setMaxColumns(params.maxColumns)
 settings.setNullValue(params.nullValue)
+settings.setEmptyValue("")
--- End diff --

Hard coding this is not a good idea. Please add a new option in `CSVOption` 
and pass to the parser. The default value could be `""`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

2016-10-14 Thread falaki
Github user falaki commented on a diff in the pull request:

https://github.com/apache/spark/pull/12904#discussion_r83342029
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -555,4 +558,37 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 
 assert(numbers.count() == 8)
   }
+
+  test("load data with empty quoted string fields.") {
--- End diff --

Would you also add a regression unit-test to make sure this patch also 
fixes https://issues.apache.org/jira/browse/SPARK-17916?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org