[jira] [Updated] (SPARK-25739) Double quote coming in as empty value even when emptyValue set as null

2018-10-16 Thread Brian Jones (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Jones updated SPARK-25739:

Flags: Patch,Important  (was: Important)

> Double quote coming in as empty value even when emptyValue set as null
> --
>
> Key: SPARK-25739
> URL: https://issues.apache.org/jira/browse/SPARK-25739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
> Environment:  Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 
> 2.11) 
>Reporter: Brian Jones
>Priority: Major
>
>  Example code - 
> {code:java}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.
>  
> I have tested the provided code in Databricks runtime environment 5.0 and 
> 4.1, and it is giving the expected output.   However in Databricks runtime 
> 4.2 and 4.3 (which are running spark 2.3.1) we get the incorrect output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25739) Double quote coming in as empty value even when emptyValue set as null

2018-10-16 Thread Brian Jones (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Jones updated SPARK-25739:

Flags: Important  (was: Patch,Important)

> Double quote coming in as empty value even when emptyValue set as null
> --
>
> Key: SPARK-25739
> URL: https://issues.apache.org/jira/browse/SPARK-25739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
> Environment:  Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 
> 2.11) 
>Reporter: Brian Jones
>Priority: Major
>
>  Example code - 
> {code:java}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.
>  
> I have tested the provided code in Databricks runtime environment 5.0 and 
> 4.1, and it is giving the expected output.   However in Databricks runtime 
> 4.2 and 4.3 (which are running spark 2.3.1) we get the incorrect output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25739) Double quote coming in as empty value even when emptyValue set as null

2018-10-15 Thread Brian Jones (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Jones updated SPARK-25739:

Description: 
 Example code - 
{code:java}
val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
df
.repartition(1)
.write
.mode("overwrite")
.option("nullValue", null)
.option("emptyValue", null)
.option("delimiter",",")
.option("quoteMode", "NONE")
.option("escape","\\")
.format("csv")
.save("/tmp/nullcsv/")

var out = dbutils.fs.ls("/tmp/nullcsv/")
var file = out(out.size - 1)
val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
println(x)
{code}
Output - 
{code:java}
1,""
3,hi
2,hello
4,
{code}
Expected output - 
{code:java}
1,
3,hi
2,hello
4,
{code}
 

[https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
 This commit is relevant to my issue.

"Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files."

I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
then, I am passing the correct "emptyValue" option.  However, my empty values 
are stilling coming as `""` in the written file.

 

I have tested the provided code in Databricks runtime environment 5.0 and 4.1, 
and it is giving the expected output.   However in Databricks runtime 4.2 and 
4.3 (which are running spark 2.3.1) we get the incorrect output.

  was:
 Example code - 
{code}
val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
df
.repartition(1)
.write
.mode("overwrite")
.option("nullValue", null)
.option("emptyValue", null)
.option("delimiter",",")
.option("quoteMode", "NONE")
.option("escape","\\")
.format("csv")
.save("/tmp/nullcsv/")

var out = dbutils.fs.ls("/tmp/nullcsv/")
var file = out(out.size - 1)
val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
println(x)
{code}
Output - 
{code:java}
1,""
3,hi
2,hello
4,
{code}
Expected output - 
{code:java}
1,
3,hi
2,hello
4,
{code}
 

[https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
 This commit is relevant to my issue.

"Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files."

I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
then, I am passing the correct "emptyValue" option.  However, my empty values 
are stilling coming as `""` in the written file.

 

I have tested the provided code in Databricks runtime environment 5.0 beta, and 
it is giving the expected output. 


> Double quote coming in as empty value even when emptyValue set as null
> --
>
> Key: SPARK-25739
> URL: https://issues.apache.org/jira/browse/SPARK-25739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
> Environment:  Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 
> 2.11) 
>Reporter: Brian Jones
>Priority: Major
>
>  Example code - 
> {code:java}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.
>  
> I have tested the provided code in Databricks runtime environment 5.0 and 
> 4.1, and it is giving the expected output.   However in Databricks runtime 
> 4.2 and 4.3 (which are running spark 2.3.1) we get the incorrect output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25739) Double quote coming in as empty value even when emptyValue set as null

2018-10-15 Thread Brian Jones (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Jones updated SPARK-25739:

Description: 
 Example code - 
{code}
val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
df
.repartition(1)
.write
.mode("overwrite")
.option("nullValue", null)
.option("emptyValue", null)
.option("delimiter",",")
.option("quoteMode", "NONE")
.option("escape","\\")
.format("csv")
.save("/tmp/nullcsv/")

var out = dbutils.fs.ls("/tmp/nullcsv/")
var file = out(out.size - 1)
val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
println(x)
{code}
Output - 
{code:java}
1,""
3,hi
2,hello
4,
{code}
Expected output - 
{code:java}
1,
3,hi
2,hello
4,
{code}
 

[https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
 This commit is relevant to my issue.

"Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files."

I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
then, I am passing the correct "emptyValue" option.  However, my empty values 
are stilling coming as `""` in the written file.

 

I have tested the provided code in Databricks runtime environment 5.0 beta, and 
it is giving the expected output. 

  was:
 Example code - 
{code:scala}
val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
df
.repartition(1)
.write
.mode("overwrite")
.option("nullValue", null)
.option("emptyValue", null)
.option("delimiter",",")
.option("quoteMode", "NONE")
.option("escape","\\")
.format("csv")
.save("/tmp/nullcsv/")

var out = dbutils.fs.ls("/tmp/nullcsv/")
var file = out(out.size - 1)
val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
println(x)
{code}
Output - 
{code:java}
1,""
3,hi
2,hello
4,
{code}
Expected output - 
{code:java}
1,
3,hi
2,hello
4,
{code}
 

[https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
 This commit is relevant to my issue.

"Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files."

I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
then, I am passing the correct "emptyValue" option.  However, my empty values 
are stilling coming as `""` in the written file.


> Double quote coming in as empty value even when emptyValue set as null
> --
>
> Key: SPARK-25739
> URL: https://issues.apache.org/jira/browse/SPARK-25739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
> Environment:  Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 
> 2.11) 
>Reporter: Brian Jones
>Priority: Major
>
>  Example code - 
> {code}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.
>  
> I have tested the provided code in Databricks runtime environment 5.0 beta, 
> and it is giving the expected output. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25739) Double quote coming in as empty value even when emptyValue set as null

2018-10-15 Thread Brian Jones (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Jones updated SPARK-25739:

Description: 
 Example code - 
{code:scala}
val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
df
.repartition(1)
.write
.mode("overwrite")
.option("nullValue", null)
.option("emptyValue", null)
.option("delimiter",",")
.option("quoteMode", "NONE")
.option("escape","\\")
.format("csv")
.save("/tmp/nullcsv/")

var out = dbutils.fs.ls("/tmp/nullcsv/")
var file = out(out.size - 1)
val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
println(x)
{code}
Output - 
{code:java}
1,""
3,hi
2,hello
4,
{code}
Expected output - 
{code:java}
1,
3,hi
2,hello
4,
{code}
 

[https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
 This commit is relevant to my issue.

"Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files."

I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
then, I am passing the correct "emptyValue" option.  However, my empty values 
are stilling coming as `""` in the written file.

  was:
 Example code - 
{code:java}
val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
df
.repartition(1)
.write
.mode("overwrite")
.option("nullValue", null)
.option("emptyValue", null)
.option("delimiter",",")
.option("quoteMode", "NONE")
.option("escape","\\")
.format("csv")
.save("/tmp/nullcsv/")

var out = dbutils.fs.ls("/tmp/nullcsv/")
var file = out(out.size - 1)
val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
println(x)
{code}
Output - 
{code:java}
1,""
3,hi
2,hello
4,
{code}
Expected output - 
{code:java}
1,
3,hi
2,hello
4,
{code}
 

[https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
 This commit is relevant to my issue.

"Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files."

I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
then, I am passing the correct "emptyValue" option.  However, my empty values 
are stilling coming as `""` in the written file.


> Double quote coming in as empty value even when emptyValue set as null
> --
>
> Key: SPARK-25739
> URL: https://issues.apache.org/jira/browse/SPARK-25739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
> Environment:  Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 
> 2.11) 
>Reporter: Brian Jones
>Priority: Major
>
>  Example code - 
> {code:scala}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25739) Double quote coming in as empty value even when emptyValue set as null

2018-10-15 Thread Brian Jones (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Jones updated SPARK-25739:

Affects Version/s: (was: 2.3.2)
   2.3.1

> Double quote coming in as empty value even when emptyValue set as null
> --
>
> Key: SPARK-25739
> URL: https://issues.apache.org/jira/browse/SPARK-25739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
> Environment:  Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 
> 2.11) 
>Reporter: Brian Jones
>Priority: Major
>
>  Example code - 
> {code:java}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25739) Double quote coming in as empty value even when emptyValue set as null

2018-10-15 Thread Brian Jones (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Jones updated SPARK-25739:

Description: 
 Example code - 
{code:java}
val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
df
.repartition(1)
.write
.mode("overwrite")
.option("nullValue", null)
.option("emptyValue", null)
.option("delimiter",",")
.option("quoteMode", "NONE")
.option("escape","\\")
.format("csv")
.save("/tmp/nullcsv/")

var out = dbutils.fs.ls("/tmp/nullcsv/")
var file = out(out.size - 1)
val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
println(x)
{code}
Output - 
{code:java}
1,""
3,hi
2,hello
4,
{code}
Expected output - 
{code:java}
1,
3,hi
2,hello
4,
{code}
 

[https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
 This commit is relevant to my issue.

"Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files."

I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
then, I am passing the correct "emptyValue" option.  However, my empty values 
are stilling coming as `""` in the written file.

  was:
 Example code - 
{code:java}
val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
df
.repartition(1)
.write
.mode("overwrite")
.option("nullValue", null)
.option("emptyValue", null)
.option("delimiter",",")
.option("quoteMode", "NONE")
.option("escape","\\")
.format("csv")
.save("/tmp/nullcsv/")

var out = dbutils.fs.ls("/tmp/nullcsv/")
var file = out(out.size - 1)
val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
println(x)
{code}
Output - 
{code:java}
1,""
3,hi
2,hello
4,
{code}
Expected output - 
{code:java}
1,
3,hi
2,hello
4,
{code}
 

[https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
 This commit is relevant to my issue.

"Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files."

I am on Spark version 2.3.2, so empty strings should be coming as null.  Even 
then, I am passing the correct "emptyValue" option.  However, my empty values 
are stilling coming as `""` in the written file.


> Double quote coming in as empty value even when emptyValue set as null
> --
>
> Key: SPARK-25739
> URL: https://issues.apache.org/jira/browse/SPARK-25739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
> Environment:  Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 
> 2.11) 
>Reporter: Brian Jones
>Priority: Major
>
>  Example code - 
> {code:java}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.1, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25739) Double quote coming in as empty value even when emptyValue set as null

2018-10-15 Thread Brian Jones (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Jones updated SPARK-25739:

Environment:  Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 2.11)   
(was:  

Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 2.11) )

> Double quote coming in as empty value even when emptyValue set as null
> --
>
> Key: SPARK-25739
> URL: https://issues.apache.org/jira/browse/SPARK-25739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
> Environment:  Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 
> 2.11) 
>Reporter: Brian Jones
>Priority: Major
>
>  Example code - 
> {code:java}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.2, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25739) Double quote coming in as empty value even when emptyValue set as null

2018-10-15 Thread Brian Jones (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Jones updated SPARK-25739:

Description: 
 Example code - 
{code:java}
val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
df
.repartition(1)
.write
.mode("overwrite")
.option("nullValue", null)
.option("emptyValue", null)
.option("delimiter",",")
.option("quoteMode", "NONE")
.option("escape","\\")
.format("csv")
.save("/tmp/nullcsv/")

var out = dbutils.fs.ls("/tmp/nullcsv/")
var file = out(out.size - 1)
val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
println(x)
{code}
Output - 
{code:java}
1,""
3,hi
2,hello
4,
{code}
Expected output - 
{code:java}
1,
3,hi
2,hello
4,
{code}
 

[https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
 This commit is relevant to my issue.

"Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files."

I am on Spark version 2.3.2, so empty strings should be coming as null.  Even 
then, I am passing the correct "emptyValue" option.  However, my empty values 
are stilling coming as `""` in the written file.

> Double quote coming in as empty value even when emptyValue set as null
> --
>
> Key: SPARK-25739
> URL: https://issues.apache.org/jira/browse/SPARK-25739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
> Environment:  Example code - 
> {code:java}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.2, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.
>  
>Reporter: Brian Jones
>Priority: Major
>
>  Example code - 
> {code:java}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.2, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25739) Double quote coming in as empty value even when emptyValue set as null

2018-10-15 Thread Brian Jones (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Jones updated SPARK-25739:

Environment: 
 

Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 2.11) 

  was:
 Example code - 
{code:java}
val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
df
.repartition(1)
.write
.mode("overwrite")
.option("nullValue", null)
.option("emptyValue", null)
.option("delimiter",",")
.option("quoteMode", "NONE")
.option("escape","\\")
.format("csv")
.save("/tmp/nullcsv/")

var out = dbutils.fs.ls("/tmp/nullcsv/")
var file = out(out.size - 1)
val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
println(x)
{code}
Output - 
{code:java}
1,""
3,hi
2,hello
4,
{code}
Expected output - 
{code:java}
1,
3,hi
2,hello
4,
{code}
 

[https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
 This commit is relevant to my issue.

"Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files."

I am on Spark version 2.3.2, so empty strings should be coming as null.  Even 
then, I am passing the correct "emptyValue" option.  However, my empty values 
are stilling coming as `""` in the written file.

 


> Double quote coming in as empty value even when emptyValue set as null
> --
>
> Key: SPARK-25739
> URL: https://issues.apache.org/jira/browse/SPARK-25739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
> Environment:  
> Databricks - 4.2 (includes Apache Spark 2.3.1, Scala 2.11) 
>Reporter: Brian Jones
>Priority: Major
>
>  Example code - 
> {code:java}
> val df = List((1,""),(2,"hello"),(3,"hi"),(4,null)).toDF("key","value")
> df
> .repartition(1)
> .write
> .mode("overwrite")
> .option("nullValue", null)
> .option("emptyValue", null)
> .option("delimiter",",")
> .option("quoteMode", "NONE")
> .option("escape","\\")
> .format("csv")
> .save("/tmp/nullcsv/")
> var out = dbutils.fs.ls("/tmp/nullcsv/")
> var file = out(out.size - 1)
> val x = dbutils.fs.head("/tmp/nullcsv/" + file.name)
> println(x)
> {code}
> Output - 
> {code:java}
> 1,""
> 3,hi
> 2,hello
> 4,
> {code}
> Expected output - 
> {code:java}
> 1,
> 3,hi
> 2,hello
> 4,
> {code}
>  
> [https://github.com/apache/spark/commit/b7efca7ece484ee85091b1b50bbc84ad779f9bfe]
>  This commit is relevant to my issue.
> "Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
> version 2.3 and earlier, empty strings are equal to `null` values and do not 
> reflect to any characters in saved CSV files."
> I am on Spark version 2.3.2, so empty strings should be coming as null.  Even 
> then, I am passing the correct "emptyValue" option.  However, my empty values 
> are stilling coming as `""` in the written file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org