[jira] [Updated] (SPARK-16460) Spark 2.0 CSV ignores NULL value in Date format

2016-09-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16460:
--
Assignee: Liwei Lin

> Spark 2.0 CSV ignores NULL value in Date format
> ---
>
> Key: SPARK-16460
> URL: https://issues.apache.org/jira/browse/SPARK-16460
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SparkR
>Reporter: Marcel Boldt
>Assignee: Liwei Lin
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> Trying to read a CSV file to Spark (using SparkR) containing just this data 
> row:
> {code}
> 1|1998-01-01||
> {code}
> Using Spark 1.6.2 (Hadoop 2.6) gives me 
> {code}
> > head(sdf)
>   id  d dtwo
> 1  1 1998-01-01   NA
> {code}
> Spark 2.0 preview (Hadoop 2.7, Rev. 14308) fails with error: 
> {panel}
> > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.text.ParseException: Unparseable date: ""
>   at java.text.DateFormat.parse(DateFormat.java:357)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:98)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:74)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource$$anonfun$buildReader$1$$anonfun$apply$1.apply(DefaultSource.scala:124)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource$$anonfun$buildReader$1$$anonfun$apply$1.apply(DefaultSource.scala:124)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Itera...
> {panel}
> The problem seems indeed the NULL value here as with a valid date in the 
> third CSV column it works.
> R code:
> {code}
> #Sys.setenv(SPARK_HOME = 'c:/spark/spark-1.6.2-bin-hadoop2.6') 
> Sys.setenv(SPARK_HOME = 'C:/spark/spark-2.0.0-preview-bin-hadoop2.7')
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
> 
> sc <-
> sparkR.init(
> master = "local",
> sparkPackages = "com.databricks:spark-csv_2.11:1.4.0"
> )
> sqlContext <- sparkRSQL.init(sc)
> 
> 
> st <- structType(structField("id", "integer"), structField("d", "date"), 
> structField("dtwo", "date"))
> 
> sdf <- read.df(
> sqlContext,
> path = "d:/date_test.csv",
> source = "com.databricks.spark.csv",
> schema = st,
> inferSchema = "false",
> delimiter = "|",
> dateFormat = "-MM-dd",
> nullValue = "",
> mode = "PERMISSIVE"
> )
> 
> head(sdf)
> 
> sparkR.stop()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16460) Spark 2.0 CSV ignores NULL value in Date format

2016-07-09 Thread Marcel Boldt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Boldt updated SPARK-16460:
-
Priority: Minor  (was: Critical)

> Spark 2.0 CSV ignores NULL value in Date format
> ---
>
> Key: SPARK-16460
> URL: https://issues.apache.org/jira/browse/SPARK-16460
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SparkR
>Reporter: Marcel Boldt
>Priority: Minor
>
> Trying to read a CSV file to Spark (using SparkR) containing just this data 
> row:
> {code}
> 1|1998-01-01||
> {code}
> Using Spark 1.6.2 (Hadoop 2.6) gives me 
> {code}
> > head(sdf)
>   id  d dtwo
> 1  1 1998-01-01   NA
> {code}
> Spark 2.0 preview (Hadoop 2.7, Rev. 14308) fails with error: 
> {panel}
> > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.text.ParseException: Unparseable date: ""
>   at java.text.DateFormat.parse(DateFormat.java:357)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:98)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:74)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource$$anonfun$buildReader$1$$anonfun$apply$1.apply(DefaultSource.scala:124)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource$$anonfun$buildReader$1$$anonfun$apply$1.apply(DefaultSource.scala:124)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Itera...
> {panel}
> The problem seems indeed the NULL value here as with a valid date in the 
> third CSV column it works.
> R code:
> {code}
> #Sys.setenv(SPARK_HOME = 'c:/spark/spark-1.6.2-bin-hadoop2.6') 
> Sys.setenv(SPARK_HOME = 'C:/spark/spark-2.0.0-preview-bin-hadoop2.7')
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
> 
> sc <-
> sparkR.init(
> master = "local",
> sparkPackages = "com.databricks:spark-csv_2.11:1.4.0"
> )
> sqlContext <- sparkRSQL.init(sc)
> 
> 
> st <- structType(structField("id", "integer"), structField("d", "date"), 
> structField("dtwo", "date"))
> 
> sdf <- read.df(
> sqlContext,
> path = "d:/date_test.csv",
> source = "com.databricks.spark.csv",
> schema = st,
> inferSchema = "false",
> delimiter = "|",
> dateFormat = "-MM-dd",
> nullValue = "",
> mode = "PERMISSIVE"
> )
> 
> head(sdf)
> 
> sparkR.stop()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16460) Spark 2.0 CSV ignores NULL value in Date format

2016-07-09 Thread Marcel Boldt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Boldt updated SPARK-16460:
-
Component/s: (was: Input/Output)

> Spark 2.0 CSV ignores NULL value in Date format
> ---
>
> Key: SPARK-16460
> URL: https://issues.apache.org/jira/browse/SPARK-16460
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SparkR
>Reporter: Marcel Boldt
>Priority: Critical
>
> Trying to read a CSV file to Spark (using SparkR) containing just this data 
> row:
> {code}
> 1|1998-01-01||
> {code}
> Using Spark 1.6.2 (Hadoop 2.6) gives me 
> {code}
> > head(sdf)
>   id  d dtwo
> 1  1 1998-01-01   NA
> {code}
> Spark 2.0 preview (Hadoop 2.7, Rev. 14308) fails with error: 
> {panel}
> > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.text.ParseException: Unparseable date: ""
>   at java.text.DateFormat.parse(DateFormat.java:357)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:98)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:74)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource$$anonfun$buildReader$1$$anonfun$apply$1.apply(DefaultSource.scala:124)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource$$anonfun$buildReader$1$$anonfun$apply$1.apply(DefaultSource.scala:124)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Itera...
> {panel}
> The problem seems indeed the NULL value here as with a valid date in the 
> third CSV column it works.
> R code:
> {code}
> #Sys.setenv(SPARK_HOME = 'c:/spark/spark-1.6.2-bin-hadoop2.6') 
> Sys.setenv(SPARK_HOME = 'C:/spark/spark-2.0.0-preview-bin-hadoop2.7')
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
> 
> sc <-
> sparkR.init(
> master = "local",
> sparkPackages = "com.databricks:spark-csv_2.11:1.4.0"
> )
> sqlContext <- sparkRSQL.init(sc)
> 
> 
> st <- structType(structField("id", "integer"), structField("d", "date"), 
> structField("dtwo", "date"))
> 
> sdf <- read.df(
> sqlContext,
> path = "d:/date_test.csv",
> source = "com.databricks.spark.csv",
> schema = st,
> inferSchema = "false",
> delimiter = "|",
> dateFormat = "-MM-dd",
> nullValue = "",
> mode = "PERMISSIVE"
> )
> 
> head(sdf)
> 
> sparkR.stop()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16460) Spark 2.0 CSV ignores NULL value in Date format

2016-07-09 Thread Marcel Boldt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Boldt updated SPARK-16460:
-
Component/s: SQL

> Spark 2.0 CSV ignores NULL value in Date format
> ---
>
> Key: SPARK-16460
> URL: https://issues.apache.org/jira/browse/SPARK-16460
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: SparkR
>Reporter: Marcel Boldt
>Priority: Critical
>
> Trying to read a CSV file to Spark (using SparkR) containing just this data 
> row:
> {code}
> 1|1998-01-01||
> {code}
> Using Spark 1.6.2 (Hadoop 2.6) gives me 
> {code}
> > head(sdf)
>   id  d dtwo
> 1  1 1998-01-01   NA
> {code}
> Spark 2.0 preview (Hadoop 2.7, Rev. 14308) fails with error: 
> {panel}
> > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 
> (TID 0, localhost): java.text.ParseException: Unparseable date: ""
>   at java.text.DateFormat.parse(DateFormat.java:357)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:98)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:74)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource$$anonfun$buildReader$1$$anonfun$apply$1.apply(DefaultSource.scala:124)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource$$anonfun$buildReader$1$$anonfun$apply$1.apply(DefaultSource.scala:124)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Itera...
> {panel}
> The problem seems indeed the NULL value here as with a valid date in the 
> third CSV column it works.
> R code:
> {code}
> #Sys.setenv(SPARK_HOME = 'c:/spark/spark-1.6.2-bin-hadoop2.6') 
> Sys.setenv(SPARK_HOME = 'C:/spark/spark-2.0.0-preview-bin-hadoop2.7')
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
> 
> sc <-
> sparkR.init(
> master = "local",
> sparkPackages = "com.databricks:spark-csv_2.11:1.4.0"
> )
> sqlContext <- sparkRSQL.init(sc)
> 
> 
> st <- structType(structField("id", "integer"), structField("d", "date"), 
> structField("dtwo", "date"))
> 
> sdf <- read.df(
> sqlContext,
> path = "d:/date_test.csv",
> source = "com.databricks.spark.csv",
> schema = st,
> inferSchema = "false",
> delimiter = "|",
> dateFormat = "-MM-dd",
> nullValue = "",
> mode = "PERMISSIVE"
> )
> 
> head(sdf)
> 
> sparkR.stop()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org