[jira] [Reopened] (SPARK-22439) Not able to get numeric columns for the file having decimal values

2017-11-13 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa reopened SPARK-22439:
--

> Not able to get numeric columns for the file having decimal values
> --
>
> Key: SPARK-22439
> URL: https://issues.apache.org/jira/browse/SPARK-22439
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, SQL
>Affects Versions: 2.2.0
>Reporter: Navya Krishnappa
>
> When reading the below-mentioned decimal value by specifying header as true.
> SourceFile: 
> 8.95977565356765764E+20
> 8.95977565356765764E+20
> 8.95977565356765764E+20
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(HEADER, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "
> ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> dataset.numericColumns()
> Result: 
> Caused by: java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:223)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:222)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22439) Not able to get numeric columns for the file having decimal values

2017-11-13 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249341#comment-16249341
 ] 

Navya Krishnappa commented on SPARK-22439:
--

[~sowen] Thank you for your response.

According to the issue, If we just add the header to the above-given data it is 
working fine. I don't understand only with header change why it is not working. 
Let me know if you need more inputs.
SourceFile: 
Column1
8.95977565356765764E+20
8.95977565356765764E+20
8.95977565356765764E+20
Source code1:
Dataset dataset = getSqlContext().read()
.option(PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(HEADER, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(ESCAPE, "
")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

dataset.numericColumns()
Columns1 - decimal(18,-3)

> Not able to get numeric columns for the file having decimal values
> --
>
> Key: SPARK-22439
> URL: https://issues.apache.org/jira/browse/SPARK-22439
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, SQL
>Affects Versions: 2.2.0
>Reporter: Navya Krishnappa
>
> When reading the below-mentioned decimal value by specifying header as true.
> SourceFile: 
> 8.95977565356765764E+20
> 8.95977565356765764E+20
> 8.95977565356765764E+20
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(HEADER, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "
> ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> dataset.numericColumns()
> Result: 
> Caused by: java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:223)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:222)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20387) Permissive mode is not replacing corrupt record with null

2017-11-13 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249301#comment-16249301
 ] 

Navya Krishnappa edited comment on SPARK-20387 at 11/13/17 9:44 AM:


Not all the corrupted values are replacing with null. Refer the below-given 
scenario:

Source File: 
'Col1','Col2','Col3','Col4','Col5','Col6',
'1000','abc','10yui000','400','20.8','2003-03-04',
'1001','xyz','3','4000','20.8','2003-03-04',
'1002','abc','4','40,000','20.8','2003-03-04'
'1003','xyz','5','40,','20.8','2003-03-04'
'1004','abc','6','40,000','20.8','2003-03-04'

User_defined_Schema:
[{
"dataType": "integer",
"type": "Measure",
"name": "Col1"
},
{
"dataType": "string",
"type": "Dimension",
"name": "Col2"
},
{
"dataType": "float",
"type": "Measure",
"name": "Col3"
},
{
"dataType": "string",
"type": "Dimension",
"name": "Col4"
},
{
"dataType": "double",
"type": "Measure",
"name": "Col5"
},
{
"dataType": "date",
"type": "Dimension",
"name": "Col6"
},
{
"dataType": "string",
"type": "Dimension",
"name": "_c6"
}

Source code1:
Dataset dataset =sparkSession.read().schema(User_defined_Schema)
.option(PARSER_LIB, "commons")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

dataset.collect();
Result: 10yui000 is parsed as 10
Row : '1000','abc','10','400','20.8','2003-03-04',

Expected: According to the PERMISSIVE mode, 10yui000 should be replaced with 
null.



was (Author: navya krishnappa):
Source File: 
'Col1','Col2','Col3','Col4','Col5','Col6',
'1000','abc','10yui000','400','20.8','2003-03-04',
'1001','xyz','3','4000','20.8','2003-03-04',
'1002','abc','4','40,000','20.8','2003-03-04'
'1003','xyz','5','40,','20.8','2003-03-04'
'1004','abc','6','40,000','20.8','2003-03-04'

User_defined_Schema:
[{
"dataType": "integer",
"type": "Measure",
"name": "Col1"
},
{
"dataType": "string",
"type": "Dimension",
"name": "Col2"
},
{
"dataType": "float",
"type": "Measure",
"name": "Col3"
},
{
"dataType": "string",
"type": "Dimension",
"name": "Col4"
},
{
"dataType": "double",
"type": "Measure",
"name": "Col5"
},
{
"dataType": "date",
"type": "Dimension",
"name": "Col6"
},
{
"dataType": "string",
"type": "Dimension",
"name": "_c6"
}

Source code1:
Dataset dataset =sparkSession.read().schema(User_defined_Schema)
.option(PARSER_LIB, "commons")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

dataset.collect();
Result: 10yui000 is parsed as 10
Row : '1000','abc','10','400','20.8','2003-03-04',

Expected: According to the PERMISSIVE mode, 10yui000 should be replaced with 
null.


> Permissive mode is not replacing corrupt record with null
> -
>
> Key: SPARK-20387
> URL: https://issues.apache.org/jira/browse/SPARK-20387
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying "mode" as 
> PERMISSIVE.
> Source File: 
> String,int,f1,bool1
> abc,23111,23.07738,true
> abc,23111,23.07738,true
> abc,23111,true,true
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> dataset.co

[jira] [Reopened] (SPARK-20387) Permissive mode is not replacing corrupt record with null

2017-11-13 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa reopened SPARK-20387:
--

Source File: 
'Col1','Col2','Col3','Col4','Col5','Col6',
'1000','abc','10yui000','400','20.8','2003-03-04',
'1001','xyz','3','4000','20.8','2003-03-04',
'1002','abc','4','40,000','20.8','2003-03-04'
'1003','xyz','5','40,','20.8','2003-03-04'
'1004','abc','6','40,000','20.8','2003-03-04'

User_defined_Schema:
[{
"dataType": "integer",
"type": "Measure",
"name": "Col1"
},
{
"dataType": "string",
"type": "Dimension",
"name": "Col2"
},
{
"dataType": "float",
"type": "Measure",
"name": "Col3"
},
{
"dataType": "string",
"type": "Dimension",
"name": "Col4"
},
{
"dataType": "double",
"type": "Measure",
"name": "Col5"
},
{
"dataType": "date",
"type": "Dimension",
"name": "Col6"
},
{
"dataType": "string",
"type": "Dimension",
"name": "_c6"
}

Source code1:
Dataset dataset =sparkSession.read().schema(User_defined_Schema)
.option(PARSER_LIB, "commons")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

dataset.collect();
Result: 10yui000 is parsed as 10
Row : '1000','abc','10','400','20.8','2003-03-04',

Expected: According to the PERMISSIVE mode, 10yui000 should be replaced with 
null.


> Permissive mode is not replacing corrupt record with null
> -
>
> Key: SPARK-20387
> URL: https://issues.apache.org/jira/browse/SPARK-20387
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying "mode" as 
> PERMISSIVE.
> Source File: 
> String,int,f1,bool1
> abc,23111,23.07738,true
> abc,23111,23.07738,true
> abc,23111,true,true
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> dataset.collect();
> Result: Error is thrown
> stack trace: 
> ERROR Executor: Exception in task 0.0 in stage 15.0 (TID 15)
> java.lang.IllegalArgumentException: For input string: "23.07738"
> at 
> scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:290)
> at 
> scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:260)
> at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:29)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:270)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:125)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22439) Not able to get numeric columns for the file having decimal values

2017-11-03 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-22439:
-
Summary: Not able to get numeric columns for the file having decimal values 
 (was: Not able to get numeric columns for the attached file)

> Not able to get numeric columns for the file having decimal values
> --
>
> Key: SPARK-22439
> URL: https://issues.apache.org/jira/browse/SPARK-22439
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, SQL
>Affects Versions: 2.2.0
>Reporter: Navya Krishnappa
>Priority: Major
>
> When reading the below-mentioned decimal value by specifying header as true.
> SourceFile: 
> 8.95977565356765764E+20
> 8.95977565356765764E+20
> 8.95977565356765764E+20
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(HEADER, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "
> ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> dataset.numericColumns()
> Result: 
> Caused by: java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:223)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:222)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22439) Not able to get numeric columns for the attached file

2017-11-03 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-22439:
-
Summary: Not able to get numeric columns for the attached file  (was: Not 
able to get numeric column for the attached file)

> Not able to get numeric columns for the attached file
> -
>
> Key: SPARK-22439
> URL: https://issues.apache.org/jira/browse/SPARK-22439
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, SQL
>Affects Versions: 2.2.0
>Reporter: Navya Krishnappa
>Priority: Major
>
> When reading the below-mentioned decimal value by specifying header as true.
> SourceFile: 
> 8.95977565356765764E+20
> 8.95977565356765764E+20
> 8.95977565356765764E+20
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(HEADER, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "
> ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> dataset.numericColumns()
> Result: 
> Caused by: java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:223)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:222)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22439) Not able to get numeric column for the attached file

2017-11-03 Thread Navya Krishnappa (JIRA)
Navya Krishnappa created SPARK-22439:


 Summary: Not able to get numeric column for the attached file
 Key: SPARK-22439
 URL: https://issues.apache.org/jira/browse/SPARK-22439
 Project: Spark
  Issue Type: Bug
  Components: Java API, SQL
Affects Versions: 2.2.0
Reporter: Navya Krishnappa
Priority: Major


When reading the below-mentioned decimal value by specifying header as true.

SourceFile: 
8.95977565356765764E+20
8.95977565356765764E+20
8.95977565356765764E+20

Source code1:
Dataset dataset = getSqlContext().read()
.option(PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(HEADER, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(ESCAPE, "
")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

dataset.numericColumns()

Result: 
Caused by: java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at 
org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:223)
at 
org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:222)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-22020) Support session local timezone

2017-09-20 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa reopened SPARK-22020:
--

This is not working as expected. Please refer the above-mentioned description

> Support session local timezone
> --
>
> Key: SPARK-22020
> URL: https://issues.apache.org/jira/browse/SPARK-22020
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Navya Krishnappa
>
> As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
> manipulation, which is bad if users are not in the same timezones as the 
> machines, or if different users have different timezones.
> Input data:
> Date,SparkDate,SparkDate1,SparkDate2
> 04/22/2017T03:30:02,2017-03-21T03:30:02,2017-03-21T03:30:02.02Z,2017-03-21T00:00:00Z
> I have set the below value to set the timeZone to UTC. It is adding the 
> current timeZone value even though it is in the UTC format.
> spark.conf.set("spark.sql.session.timeZone", "UTC")
> Expected : Time should remain same as the input since it's already in UTC 
> format
> var df1 = spark.read.option("delimiter", ",").option("qualifier", 
> "\"").option("inferSchema","true").option("header", "true").option("mode", 
> "PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat",
>  "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv");
> df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more 
> fields]
> scala> df1.show(false);
> --
> Name  Age Add DateSparkDate   SparkDate1  SparkDate2
> --
> abc   21  bvxc04/22/2017T03:30:02 2017-03-21 03:30:02 
> 2017-03-21 09:00:02.02  2017-03-21 05:30:00
> --



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22020) Support session local timezone

2017-09-14 Thread Navya Krishnappa (JIRA)
Navya Krishnappa created SPARK-22020:


 Summary: Support session local timezone
 Key: SPARK-22020
 URL: https://issues.apache.org/jira/browse/SPARK-22020
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Navya Krishnappa


As of Spark 2.1, Spark SQL assumes the machine timezone for datetime 
manipulation, which is bad if users are not in the same timezones as the 
machines, or if different users have different timezones.

Input data:
Date,SparkDate,SparkDate1,SparkDate2
04/22/2017T03:30:02,2017-03-21T03:30:02,2017-03-21T03:30:02.02Z,2017-03-21T00:00:00Z

I have set the below value to set the timeZone to UTC. It is adding the current 
timeZone value even though it is in the UTC format.
spark.conf.set("spark.sql.session.timeZone", "UTC")

Expected : Time should remain same as the input since it's already in UTC format
var df1 = spark.read.option("delimiter", ",").option("qualifier", 
"\"").option("inferSchema","true").option("header", "true").option("mode", 
"PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat",
 "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv");
df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more fields]
scala> df1.show(false);
--
NameAge Add DateSparkDate   SparkDate1  SparkDate2
--
abc 21  bvxc04/22/2017T03:30:02 2017-03-21 03:30:02 
2017-03-21 09:00:02.02  2017-03-21 05:30:00
--



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2017-07-03 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072240#comment-16072240
 ] 

Navya Krishnappa edited comment on SPARK-18877 at 7/4/17 5:42 AM:
--

[~dongjoon] I have created parquet bug for the invalid scale issue in Decimal 
data type. But Parquet team is telling its Spark issue. Please refer 
https://issues.apache.org/jira/browse/PARQUET-815 and add your comments.


was (Author: navya krishnappa):
[~dongjoon] I have created parquet bug for the invalid scale issue in Decimal 
data type. But Parquet team is telling its a Spark issue. Please refer 
https://issues.apache.org/jira/browse/PARQUET-815 and add your comments.

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>Assignee: Dongjoon Hyun
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2017-07-03 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072240#comment-16072240
 ] 

Navya Krishnappa edited comment on SPARK-18877 at 7/3/17 10:33 AM:
---

[~dongjoon] I have created parquet bug for the invalid scale issue in Decimal 
data type. But Parquet team is telling its a Spark issue. Please refer 
https://issues.apache.org/jira/browse/PARQUET-815 and add your comments.


was (Author: navya krishnappa):
[~dongjoon] I have created parquet bug the invalid scale issue for Decimal data 
type. But Parquet team is telling its a Spark issue. Please refer 
https://issues.apache.org/jira/browse/PARQUET-815 and add your comments.

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>Assignee: Dongjoon Hyun
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2017-07-03 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072240#comment-16072240
 ] 

Navya Krishnappa commented on SPARK-18877:
--

[~dongjoon] I have created parquet bug the invalid scale issue for Decimal data 
type. But Parquet team is telling its a Spark issue. Please refer 
https://issues.apache.org/jira/browse/PARQUET-815 and add your comments.

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>Assignee: Dongjoon Hyun
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double

2017-07-03 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072234#comment-16072234
 ] 

Navya Krishnappa commented on SPARK-21263:
--

[~sowen] & [~hyukjin.kwon] Thanks for your comments. Let me know the resolution 
for this issue.

> NumberFormatException is not thrown while converting an invalid string to 
> float/double
> --
>
> Key: SPARK-21263
> URL: https://issues.apache.org/jira/browse/SPARK-21263
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.1
>Reporter: Navya Krishnappa
>
> When reading a below-mentioned data by specifying user-defined schema, 
> exception is not thrown. Refer the details :
> *Data:* 
> 'PatientID','PatientName','TotalBill'
> '1000','Patient1','10u000'
> '1001','Patient2','3'
> '1002','Patient3','4'
> '1003','Patient4','5'
> '1004','Patient5','6'
> *Source code*: 
> Dataset dataset = sparkSession.read().schema(schema)
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> When we collect the dataset data: 
> dataset.collectAsList();
> *Schema1*: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,IntegerType,true)]
> *Result *: Throws NumerFormatException 
> Caused by: java.lang.NumberFormatException: For input string: "10u000"
> *Schema2*: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,DoubleType,true)]
> *Actual Result*: 
> "PatientID": 1000,
> "NumberOfVisits": "400",
> "TotalBill": 10,
> *Expected Result*: Should throw NumberFormatException for input string 
> "10u000"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double

2017-06-30 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-21263:
-
Description: 
When reading a below-mentioned data by specifying user-defined schema, 
exception is not thrown. Refer the details :

*Data:* 
'PatientID','PatientName','TotalBill'
'1000','Patient1','10u000'
'1001','Patient2','3'
'1002','Patient3','4'
'1003','Patient4','5'
'1004','Patient5','6'

*Source code*: 
Dataset dataset = sparkSession.read().schema(schema)
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

When we collect the dataset data: 
dataset.collectAsList();

*Schema1*: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,IntegerType,true)]
*Result *: Throws NumerFormatException 
Caused by: java.lang.NumberFormatException: For input string: "10u000"

*Schema2*: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,DoubleType,true)]
*Actual Result*: 
"PatientID": 1000,
"NumberOfVisits": "400",
"TotalBill": 10,
*Expected Result*: Should throw NumberFormatException for input string "10u000"

  was:
When reading a below-mentioned data by specifying user-defined schema, 
exception is not thrown. Refer the 

*Data:* 
'PatientID','PatientName','TotalBill'
'1000','Patient1','10u000'
'1001','Patient2','3'
'1002','Patient3','4'
'1003','Patient4','5'
'1004','Patient5','6'

*Source code*: 
Dataset dataset = sparkSession.read().schema(schema)
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

When we collect the dataset data: 
dataset.collectAsList();

*Schema1*: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,IntegerType,true)]
*Result *: Throws NumerFormatException 
Caused by: java.lang.NumberFormatException: For input string: "10u000"

*Schema2*: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,DoubleType,true)]
*Actual Result*: 
"PatientID": 1000,
"NumberOfVisits": "400",
"TotalBill": 10,
*Expected Result*: Should throw NumberFormatException for input string "10u000"


> NumberFormatException is not thrown while converting an invalid string to 
> float/double
> --
>
> Key: SPARK-21263
> URL: https://issues.apache.org/jira/browse/SPARK-21263
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.1
>Reporter: Navya Krishnappa
>
> When reading a below-mentioned data by specifying user-defined schema, 
> exception is not thrown. Refer the details :
> *Data:* 
> 'PatientID','PatientName','TotalBill'
> '1000','Patient1','10u000'
> '1001','Patient2','3'
> '1002','Patient3','4'
> '1003','Patient4','5'
> '1004','Patient5','6'
> *Source code*: 
> Dataset dataset = sparkSession.read().schema(schema)
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> When we collect the dataset data: 
> dataset.collectAsList();
> *Schema1*: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,IntegerType,true)]
> *Result *: Throws NumerFormatException 
> Caused by: java.lang.NumberFormatException: For input string: "10u000"
> *Schema2*: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,DoubleType,true)]
> *Actual Result*: 
> "PatientID": 1000,
> "NumberOfVisits": "400",
> "TotalBill": 10,
> *Expected Result*: Should throw NumberFormatException for input string 
> "10u000"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double

2017-06-30 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-21263:
-
Description: 
When reading a below-mentioned data by specifying user-defined schema, 
exception is not thrown. Refer the 

*Data:* 
'PatientID','PatientName','TotalBill'
'1000','Patient1','10u000'
'1001','Patient2','3'
'1002','Patient3','4'
'1003','Patient4','5'
'1004','Patient5','6'

*Source code*: 
Dataset dataset = sparkSession.read().schema(schema)
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

When we collect the dataset data: 
dataset.collectAsList();

*Schema1*: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,IntegerType,true)]
*Result *: Throws NumerFormatException 
Caused by: java.lang.NumberFormatException: For input string: "10u000"

*Schema2*: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,DoubleType,true)]
*Actual Result*: 
"PatientID": 1000,
"NumberOfVisits": "400",
"TotalBill": 10,
*Expected Result*: Should throw NumberFormatException for input string "10u000"

  was:
When reading a below-mentioned data by specifying user-defined schema, 
exception is not thrown.

*Data:* 
'PatientID','PatientName','TotalBill'
'1000','Patient1','10u000'
'1001','Patient2','3'
'1002','Patient3','4'
'1003','Patient4','5'
'1004','Patient5','6'

*Source code*: 
Dataset dataset = sparkSession.read().schema(schema)
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

When we collect the dataset data: 
dataset.collectAsList();

*Schema1*: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,IntegerType,true)]
*Result *: Throws NumerFormatException 
Caused by: java.lang.NumberFormatException: For input string: "10u000"

*Schema2*: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,DoubleType,true)]
*Actual Result*: 
"PatientID": 1000,
"NumberOfVisits": "400",
"TotalBill": 10,
*Expected Result*: Should throw NumberFormatException for input string "10u000"


> NumberFormatException is not thrown while converting an invalid string to 
> float/double
> --
>
> Key: SPARK-21263
> URL: https://issues.apache.org/jira/browse/SPARK-21263
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.1
>Reporter: Navya Krishnappa
>
> When reading a below-mentioned data by specifying user-defined schema, 
> exception is not thrown. Refer the 
> *Data:* 
> 'PatientID','PatientName','TotalBill'
> '1000','Patient1','10u000'
> '1001','Patient2','3'
> '1002','Patient3','4'
> '1003','Patient4','5'
> '1004','Patient5','6'
> *Source code*: 
> Dataset dataset = sparkSession.read().schema(schema)
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> When we collect the dataset data: 
> dataset.collectAsList();
> *Schema1*: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,IntegerType,true)]
> *Result *: Throws NumerFormatException 
> Caused by: java.lang.NumberFormatException: For input string: "10u000"
> *Schema2*: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,DoubleType,true)]
> *Actual Result*: 
> "PatientID": 1000,
> "NumberOfVisits": "400",
> "TotalBill": 10,
> *Expected Result*: Should throw NumberFormatException for input string 
> "10u000"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double

2017-06-30 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-21263:
-
Description: 
When reading a below-mentioned data by specifying user-defined schema, 
exception is not thrown.

*Data:* 
'PatientID','PatientName','TotalBill'
'1000','Patient1','10u000'
'1001','Patient2','3'
'1002','Patient3','4'
'1003','Patient4','5'
'1004','Patient5','6'

*Source code*: 
Dataset dataset = sparkSession.read().schema(schema)
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

When we collect the dataset data: 
dataset.collectAsList();

*Schema1*: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,IntegerType,true)]
*Result *: Throws NumerFormatException 
Caused by: java.lang.NumberFormatException: For input string: "10u000"

*Schema2*: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,DoubleType,true)]
*Actual Result*: 
"PatientID": 1000,
"NumberOfVisits": "400",
"TotalBill": 10,
*Expected Result*: Should throw NumberFormatException for input string "10u000"

  was:
When reading a below-mentioned data by specifying user-defined schema, 
exception is not thrown.

Data
'PatientID','PatientName','TotalBill'
'1000','Patient1','10u000'
'1001','Patient2','3'
'1002','Patient3','4'
'1003','Patient4','5'
'1004','Patient5','6'

Source code: 
Dataset dataset = sparkSession.read().schema(schema)
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

When we collect the dataset data: 
dataset.collectAsList();

Schema1: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,IntegerType,true)]
*Result *: Throws NumerFormatException 
Caused by: java.lang.NumberFormatException: For input string: "10u000"

Schema2: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,DoubleType,true)]
*Actual Result*: 
"PatientID": 1000,
"NumberOfVisits": "400",
"TotalBill": 10,
*Expected Result*: Should throw NumberFormatException for input string "10u000"


> NumberFormatException is not thrown while converting an invalid string to 
> float/double
> --
>
> Key: SPARK-21263
> URL: https://issues.apache.org/jira/browse/SPARK-21263
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.1
>Reporter: Navya Krishnappa
>
> When reading a below-mentioned data by specifying user-defined schema, 
> exception is not thrown.
> *Data:* 
> 'PatientID','PatientName','TotalBill'
> '1000','Patient1','10u000'
> '1001','Patient2','3'
> '1002','Patient3','4'
> '1003','Patient4','5'
> '1004','Patient5','6'
> *Source code*: 
> Dataset dataset = sparkSession.read().schema(schema)
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> When we collect the dataset data: 
> dataset.collectAsList();
> *Schema1*: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,IntegerType,true)]
> *Result *: Throws NumerFormatException 
> Caused by: java.lang.NumberFormatException: For input string: "10u000"
> *Schema2*: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,DoubleType,true)]
> *Actual Result*: 
> "PatientID": 1000,
> "NumberOfVisits": "400",
> "TotalBill": 10,
> *Expected Result*: Should throw NumberFormatException for input string 
> "10u000"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double

2017-06-30 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-21263:
-
Description: 
When reading a below-mentioned data by specifying user-defined schema, 
exception is not thrown.

Data
'PatientID','PatientName','TotalBill'
'1000','Patient1','10u000'
'1001','Patient2','3'
'1002','Patient3','4'
'1003','Patient4','5'
'1004','Patient5','6'

Source code: 
Dataset dataset = sparkSession.read().schema(schema)
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

When we collect the dataset data: 
dataset.collectAsList();

Schema1: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,IntegerType,true)]
*Result *: Throws NumerFormatException 
Caused by: java.lang.NumberFormatException: For input string: "10u000"

Schema2: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,DoubleType,true)]
*Actual Result*: 
"PatientID": 1000,
"NumberOfVisits": "400",
"TotalBill": 10,
*Expected Result*: Should throw NumberFormatException for input string "10u000"

  was:
When reading a below-mentioned data by specifying user-defined schema, 
exception is not thrown.

Data
'PatientID','PatientName','TotalBill'
'1000','Patient1','10u000'
'1001','Patient2','3'
'1002','Patient3','4'
'1003','Patient4','5'
'1004','Patient5','6'

Schema1: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,IntegerType,true)]
Result : Throws NumerFormatException 
Caused by: java.lang.NumberFormatException: For input string: "10u000"

Schema2: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,DoubleType,true)]
Actual Result: 
"PatientID": 1000,
"NumberOfVisits": "400",
"TotalBill": 10,
Expected Result: Should throw NumberFormatException for input string "10u000"


> NumberFormatException is not thrown while converting an invalid string to 
> float/double
> --
>
> Key: SPARK-21263
> URL: https://issues.apache.org/jira/browse/SPARK-21263
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.1
>Reporter: Navya Krishnappa
>
> When reading a below-mentioned data by specifying user-defined schema, 
> exception is not thrown.
> Data
> 'PatientID','PatientName','TotalBill'
> '1000','Patient1','10u000'
> '1001','Patient2','3'
> '1002','Patient3','4'
> '1003','Patient4','5'
> '1004','Patient5','6'
> Source code: 
> Dataset dataset = sparkSession.read().schema(schema)
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> When we collect the dataset data: 
> dataset.collectAsList();
> Schema1: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,IntegerType,true)]
> *Result *: Throws NumerFormatException 
> Caused by: java.lang.NumberFormatException: For input string: "10u000"
> Schema2: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,DoubleType,true)]
> *Actual Result*: 
> "PatientID": 1000,
> "NumberOfVisits": "400",
> "TotalBill": 10,
> *Expected Result*: Should throw NumberFormatException for input string 
> "10u000"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21263) Exception is not thrown while converting an invalid string to float/double

2017-06-30 Thread Navya Krishnappa (JIRA)
Navya Krishnappa created SPARK-21263:


 Summary: Exception is not thrown while converting an invalid 
string to float/double
 Key: SPARK-21263
 URL: https://issues.apache.org/jira/browse/SPARK-21263
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 2.1.1
Reporter: Navya Krishnappa


When reading a below-mentioned data by specifying user-defined schema, 
exception is not thrown.

Data
'PatientID','PatientName','TotalBill'
'1000','Patient1','10u000'
'1001','Patient2','3'
'1002','Patient3','4'
'1003','Patient4','5'
'1004','Patient5','6'

Schema1: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,IntegerType,true)]
Result : Throws NumerFormatException 
Caused by: java.lang.NumberFormatException: For input string: "10u000"

Schema2: 
[StructField(PatientID,IntegerType,true), 
StructField(PatientName,StringType,true), 
StructField(TotalBill,DoubleType,true)]
Actual Result: 
"PatientID": 1000,
"NumberOfVisits": "400",
"TotalBill": 10,
Expected Result: Should throw NumberFormatException for input string "10u000"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double

2017-06-30 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-21263:
-
Summary: NumberFormatException is not thrown while converting an invalid 
string to float/double  (was: Exception is not thrown while converting an 
invalid string to float/double)

> NumberFormatException is not thrown while converting an invalid string to 
> float/double
> --
>
> Key: SPARK-21263
> URL: https://issues.apache.org/jira/browse/SPARK-21263
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.1
>Reporter: Navya Krishnappa
>
> When reading a below-mentioned data by specifying user-defined schema, 
> exception is not thrown.
> Data
> 'PatientID','PatientName','TotalBill'
> '1000','Patient1','10u000'
> '1001','Patient2','3'
> '1002','Patient3','4'
> '1003','Patient4','5'
> '1004','Patient5','6'
> Schema1: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,IntegerType,true)]
> Result : Throws NumerFormatException 
> Caused by: java.lang.NumberFormatException: For input string: "10u000"
> Schema2: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,DoubleType,true)]
> Actual Result: 
> "PatientID": 1000,
> "NumberOfVisits": "400",
> "TotalBill": 10,
> Expected Result: Should throw NumberFormatException for input string "10u000"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2017-05-22 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756554#comment-15756554
 ] 

Navya Krishnappa edited comment on SPARK-18877 at 5/23/17 4:21 AM:
---

Thank you for replying [~dongjoon]. Can you help me in understanding whether 
the above mentioned PR will resolve the below mentioned issue.

I have another issue with respect to the decimal scale. When i'm trying to read 
the below mentioned csv source file and creating an parquet file from that 
throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 
exception.


The source file content is 
Row(column name)
9.03E+12
1.19E+11

 Refer the given code used read the csv file and creating an parquet file:

//Read the csv file
Dataset dataset = getSqlContext().read()
.option(HEADER, "true")
.option(PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(ESCAPE, "
")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile)

// create an parquet file
dataset.write().parquet("//path.parquet")


Stack trace:

Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250)
at org.apache.parquet.schema.Types$Builder.named(Types.java:228)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)



was (Author: navya krishnappa):
Thank you for replying [~dongjoon]. Can you help me in understanding whether 
the above mentioned PR will resolve the below mentioned issue.

I have another issue with respect to the decimal scale. When i'm trying to read 
the below mentioned csv source file and creating an parquet file from that 
throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 
exception.


The source file content is 
Row(column name)
9.03E+12
1.19E+11

 Refer the given code used read the csv file and creating an parquet file:

//Read the csv file
Dataset dataset = getSqlContext().read()
.option(HEADER, "true")
.option(PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(ESCAPE, "
")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile)

// create an parquet file
dataset.write().parquet("//path.parquet")


Stack trace:

Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250)
at org.apache.parquet.schema.Types$Builder.named(Types.java:228)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at org.apache.spark.sql.types.StructType.map(StructType.scala:95)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasou

[jira] [Updated] (SPARK-20387) Permissive mode is not replacing corrupt record with null

2017-04-19 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-20387:
-
Description: 
When reading the below mentioned time value by specifying "mode" as PERMISSIVE.

Source File: 
String,int,f1,bool1
abc,23111,23.07738,true
abc,23111,23.07738,true
abc,23111,true,true

Source code1:
Dataset dataset = getSqlContext().read()
.option(PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);
dataset.collect();

Result: Error is thrown
stack trace: 
ERROR Executor: Exception in task 0.0 in stage 15.0 (TID 15)
java.lang.IllegalArgumentException: For input string: "23.07738"
at 
scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:290)
at 
scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:260)
at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:29)
at 
org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:270)
at 
org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:125)
at 
org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94)
at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167)
at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166)

> Permissive mode is not replacing corrupt record with null
> -
>
> Key: SPARK-20387
> URL: https://issues.apache.org/jira/browse/SPARK-20387
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying "mode" as 
> PERMISSIVE.
> Source File: 
> String,int,f1,bool1
> abc,23111,23.07738,true
> abc,23111,23.07738,true
> abc,23111,true,true
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> dataset.collect();
> Result: Error is thrown
> stack trace: 
> ERROR Executor: Exception in task 0.0 in stage 15.0 (TID 15)
> java.lang.IllegalArgumentException: For input string: "23.07738"
> at 
> scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:290)
> at 
> scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:260)
> at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:29)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:270)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:125)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20387) Permissive mode is not replacing corrupt record with null

2017-04-19 Thread Navya Krishnappa (JIRA)
Navya Krishnappa created SPARK-20387:


 Summary: Permissive mode is not replacing corrupt record with null
 Key: SPARK-20387
 URL: https://issues.apache.org/jira/browse/SPARK-20387
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 2.1.0
Reporter: Navya Krishnappa






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18936) Infrastructure for session local timezone support

2017-03-31 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950748#comment-15950748
 ] 

Navya Krishnappa edited comment on SPARK-18936 at 3/31/17 11:51 AM:


I think this fix helps us to set the time zone in the spark configurations. If 
it's so Can we set "UTC" as my time zone??

And let me know if I misunderstood the document.


was (Author: navya krishnappa):
I think this fix helps us to set the time zone in the spark configurations. If 
it's so Can we set "UTC" time zone??

And let me know if I misunderstood the document.

> Infrastructure for session local timezone support
> -
>
> Key: SPARK-18936
> URL: https://issues.apache.org/jira/browse/SPARK-18936
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18936) Infrastructure for session local timezone support

2017-03-31 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950748#comment-15950748
 ] 

Navya Krishnappa commented on SPARK-18936:
--

I think this fix helps us to set the time zone in the spark configurations. If 
it's so Can we set "UTC" time zone??

And let me know if I misunderstood the document.

> Infrastructure for session local timezone support
> -
>
> Key: SPARK-18936
> URL: https://issues.apache.org/jira/browse/SPARK-18936
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Takuya Ueshin
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-31 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950745#comment-15950745
 ] 

Navya Krishnappa commented on SPARK-20152:
--

[~srowen] & [~hyukjin.kwon] Thank you for your comments. 

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949399#comment-15949399
 ] 

Navya Krishnappa edited comment on SPARK-20152 at 3/30/17 4:53 PM:
---

But if we specify timestampformat: "-MM-dd'T'HH:mm:ss.SSSZZ" and parse 
"2017-03-21T00:00:00Z", it is working fine. The Same scenario is not applied 
while parsing "03-21-2017T03:30:02Z" with "MM-dd-'T'HH:mm:ss.SSSZZ" format. 
 Let me know if my inputs are wrong.


was (Author: navya krishnappa):
But if we specify timestampformat: "-MM-dd'T'HH:mm:ss.SSSZZ" and parse 
"2017-03-21T00:00:00Z", it is working fine. Same scenario is not applied while 
parsing "03-21-2017T03:30:02Z" with "MM-dd-'T'HH:mm:ss.SSSZZ" format.  Let 
me know if my inputs are wrong.

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949399#comment-15949399
 ] 

Navya Krishnappa commented on SPARK-20152:
--

But if we specify timestampformat: "-MM-dd'T'HH:mm:ss.SSSZZ" and parse 
"2017-03-21T00:00:00Z", it is working fine. Same scenario is not applied while 
parsing "03-21-2017T03:30:02Z" with "MM-dd-'T'HH:mm:ss.SSSZZ" format.  Let 
me know if my inputs are wrong.

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948989#comment-15948989
 ] 

Navya Krishnappa edited comment on SPARK-20152 at 3/30/17 12:48 PM:


According to the spark "-MM-dd'T'HH:mm:ss.SSSZZ" is default timestamp 
format. In above-mentioned example, i have swapped the date fields. And I'm 
using valid letters in my format.


was (Author: navya krishnappa):
According to the spark "-MM-dd'T'HH:mm:ss.SSSZZ" is default timestamp 
format. In examples, i have swapped the date fields. And I'm using valid 
letters in my format.

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948989#comment-15948989
 ] 

Navya Krishnappa commented on SPARK-20152:
--

According to the spark "-MM-dd'T'HH:mm:ss.SSSZZ" is default timestamp 
format. In examples, i have swapped the date fields. And I'm using valid 
letters in my format.

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2017-03-30 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756554#comment-15756554
 ] 

Navya Krishnappa edited comment on SPARK-18877 at 3/30/17 12:45 PM:


Thank you for replying [~dongjoon]. Can you help me in understanding whether 
the above mentioned PR will resolve the below mentioned issue.

I have another issue with respect to the decimal scale. When i'm trying to read 
the below mentioned csv source file and creating an parquet file from that 
throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 
exception.


The source file content is 
Row(column name)
9.03E+12
1.19E+11

 Refer the given code used read the csv file and creating an parquet file:

//Read the csv file
Dataset dataset = getSqlContext().read()
.option(HEADER, "true")
.option(PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(ESCAPE, "
")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile)

// create an parquet file
dataset.write().parquet("//path.parquet")


Stack trace:

Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250)
at org.apache.parquet.schema.Types$Builder.named(Types.java:228)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at org.apache.spark.sql.types.StructType.map(StructType.scala:95)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:85)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:288)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)



was (Author: navya krishnappa):
Thank you for replying [~dongjoon]. Can you help me in understanding whether 
the above mentioned PR will resolve the below mentioned issue.

I have another issue with respect to the decimal scale. When i'm trying to read 
the below mentioned csv source file and creating an parquet file from that 
throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 
exception.


The source file content is 
Row(column name)
9.03E+12
1.19E+11

 Refer the given code used read the csv file and creating an parquet file:

//Read the csv file
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.HEADER, "true")
.option(DAWBConstant.PARSER_LIB, "commons")
.

[jira] [Updated] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-20152:
-
Description: 
When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Source File: 
TimeColumn
03-21-2017T03:30:02Z

Source code1:
Dataset dataset = getSqlContext().read()
.option(PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
expected result is TimeCoumn should be of "TimestampType"  and should consider 
time zone for manipulation

Source code2: 
Dataset dataset = getSqlContext().read() 
.option(PARSER_LIB, "commons") 
.option(INFER_SCHEMA, "true") 
.option(DELIMITER, ",") 
.option(QUOTE, "\"") 
.option(ESCAPE, "\\") 
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
.option(MODE, Mode.PERMISSIVE) 
.csv(sourceFile); 

Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
expected result is TimeCoumn should consider time zone for manipulation

  was:
When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Source File: 
TimeColumn
03-21-2017T03:30:02Z

Source code1:
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
expected result is TimeCoumn should be of "TimestampType"  and should consider 
time zone for manipulation

Source code2: 
Dataset dataset = getSqlContext().read() 
.option(DAWBConstant.PARSER_LIB, "commons") 
.option(INFER_SCHEMA, "true") 
.option(DAWBConstant.DELIMITER, ",") 
.option(DAWBConstant.QUOTE, "\"") 
.option(DAWBConstant.ESCAPE, "\\") 
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
.option(DAWBConstant.MODE, Mode.PERMISSIVE) 
.csv(sourceFile); 

Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
expected result is TimeCoumn should consider time zone for manipulation


> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2017-03-30 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754865#comment-15754865
 ] 

Navya Krishnappa edited comment on SPARK-18877 at 3/30/17 12:44 PM:


I'm using SparkContext.read() to read the content. Refer the given code using 
to read the csv file.

Dataset dataset = getSqlContext().read()
.option(HEADER, "true")
.option(PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DELIMITER, ",")
.option(QUOTE, "\"")
.option(ESCAPE, "\\")
.option(MODE, Mode.PERMISSIVE)
.csv(sourceFile);

if we collect the dataset (dataset.collect()). i will get 
java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
exceeds max precision 20 exception.


was (Author: navya krishnappa):
I'm using SparkContext.read() to read the content. Refer the given code using 
to read the csv file.

Dataset dataset = getSqlContext().read()
.option(DAWBConstant.HEADER, "true")
.option(DAWBConstant.PARSER_LIB, "commons")
.option(DAWBConstant.INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

if we collect the dataset (dataset.collect()). i will get 
java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
exceeds max precision 20 exception.

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>Assignee: Dongjoon Hyun
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-20152:
-
Description: 
When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Source File: 
TimeColumn
03-21-2017T03:30:02Z

Source code1:
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
expected result is TimeCoumn should be of "TimestampType"  and should consider 
time zone for manipulation

Source code2: 
Dataset dataset = getSqlContext().read() 
.option(DAWBConstant.PARSER_LIB, "commons") 
.option(INFER_SCHEMA, "true") 
.option(DAWBConstant.DELIMITER, ",") 
.option(DAWBConstant.QUOTE, "\"") 
.option(DAWBConstant.ESCAPE, "\\") 
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
.option(DAWBConstant.MODE, Mode.PERMISSIVE) 
.csv(sourceFile); 

Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
expected result is TimeCoumn should consider time zone for manipulation

  was:
When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Source File: 
TimeColumn
03-21-2017T03:30:02Z

Source code1:
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
expected result is TimeCoumn should be of "TimestampType"  and should consider 
time zone for manipulation

Source code2: 
Dataset dataset = getSqlContext().read() 
.option(DAWBConstant.PARSER_LIB, "commons") 
.option(INFER_SCHEMA, "true") 
.option(DAWBConstant.DELIMITER, ",") 
.option(DAWBConstant.QUOTE, "\"") 
.option(DAWBConstant.ESCAPE, "\\") 
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
.option(DAWBConstant.MODE, Mode.PERMISSIVE) 
.csv(sourceFile); 

Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but 
expected result is TimeCoumn should consider time zone for manipulation


> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(DAWBConstant.PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DAWBConstant.DELIMITER, ",")
> .option(DAWBConstant.QUOTE, "\"")
> .option(DAWBConstant.ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(DAWBConstant.MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(DAWBConstant.PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DAWBConstant.DELIMITER, ",") 
> .option(DAWBConstant.QUOTE, "\"") 
> .option(DAWBConstant.ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(DAWBConstant.MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-20152:
-
Description: 
When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Source File: 
TimeColumn
03-21-2017T03:30:02Z

Source code1:
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
expected result is TimeCoumn should be of "TimestampType"  and should consider 
time zone for manipulation

Source code2: 
Dataset dataset = getSqlContext().read() 
.option(DAWBConstant.PARSER_LIB, "commons") 
.option(INFER_SCHEMA, "true") 
.option(DAWBConstant.DELIMITER, ",") 
.option(DAWBConstant.QUOTE, "\"") 
.option(DAWBConstant.ESCAPE, "\\") 
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
.option(DAWBConstant.MODE, Mode.PERMISSIVE) 
.csv(sourceFile); 

Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but 
expected result is TimeCoumn should consider time zone for manipulation

  was:
When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Source File: 
TimeColumn
03-21-2017T03:30:02Z

Source code:
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
expected result is TimeCoumn should be of "TimestampType"  and should consider 
time zone for manipulation


> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(DAWBConstant.PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DAWBConstant.DELIMITER, ",")
> .option(DAWBConstant.QUOTE, "\"")
> .option(DAWBConstant.ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(DAWBConstant.MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(DAWBConstant.PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DAWBConstant.DELIMITER, ",") 
> .option(DAWBConstant.QUOTE, "\"") 
> .option(DAWBConstant.ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(DAWBConstant.MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-20152:
-
Description: 
When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Source File: 
TimeColumn
03-21-2017T03:30:02Z

Source code:
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
expected result is TimeCoumn should be of "TimestampType"  and should consider 
time zone for manipulation

  was:
When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Source File: 
TimeColumn
03-21-2017T03:30:02Z

Source code1:
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
expected result is TimeCoumn should be of "TimestampType"  and should consider 
time zone for manipulation

Source code2:
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but 
expected result is TimeCoumn should consider time zone for manipulation


> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code:
> Dataset dataset = getSqlContext().read()
> .option(DAWBConstant.PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DAWBConstant.DELIMITER, ",")
> .option(DAWBConstant.QUOTE, "\"")
> .option(DAWBConstant.ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(DAWBConstant.MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-29 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-20152:
-
Description: 
When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Source File: 
TimeColumn
03-21-2017T03:30:02Z

Source code1:
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
expected result is TimeCoumn should be of "TimestampType"  and should consider 
time zone for manipulation

Source code2:
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.PARSER_LIB, "commons")
.option(INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option("timestampFormat" , "MM-dd-'T'HH:mm:ss")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but 
expected result is TimeCoumn should consider time zone for manipulation

  was:
When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Sample data: 
TimeColumn
03-21-2017T03:30:02Z


Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z"

Expected Result: TimeCoumn should be of "TimestampType" 


> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(DAWBConstant.PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DAWBConstant.DELIMITER, ",")
> .option(DAWBConstant.QUOTE, "\"")
> .option(DAWBConstant.ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(DAWBConstant.MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2:
> Dataset dataset = getSqlContext().read()
> .option(DAWBConstant.PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DAWBConstant.DELIMITER, ",")
> .option(DAWBConstant.QUOTE, "\"")
> .option(DAWBConstant.ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss")
> .option(DAWBConstant.MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-29 Thread Navya Krishnappa (JIRA)
Navya Krishnappa created SPARK-20152:


 Summary: Time zone is not respected while parsing csv for 
timeStampFormat "MM-dd-'T'HH:mm:ss.SSSZZ"
 Key: SPARK-20152
 URL: https://issues.apache.org/jira/browse/SPARK-20152
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Navya Krishnappa


When reading the below mentioned time value by specifying the 
"timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.

Sample data: 
TimeColumn
03-21-2017T03:30:02Z


Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z"

Expected Result: TimeCoumn should be of "TimestampType" 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api

2017-02-14 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867357#comment-15867357
 ] 

Navya Krishnappa edited comment on SPARK-19442 at 2/15/17 7:04 AM:
---

Thank you [~hyukjin.kwon]. 

It is working as per my requirement. I could create a new column with blank 
values. :) 


was (Author: navya krishnappa):
Thank you [~hyukjin.kwon]. 

It is satisfied my requirement. I could create a new column with blank values. 
:) 

> Unable to add column to the dataset using Dataset.WithColumn() api
> --
>
> Key: SPARK-19442
> URL: https://issues.apache.org/jira/browse/SPARK-19442
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When I'm creating a new column using Dataset.WithColumn() api, Analysis 
> Exception is thrown.
> Dataset.WithColumn() api: 
> Dataset.withColumn("newColumnName', new 
> org.apache.spark.sql.Column("newColumnName").cast("int"));
> Stacktrace: 
> cannot resolve '`NewColumn`' given input columns: [abc,xyz ]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api

2017-02-14 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867357#comment-15867357
 ] 

Navya Krishnappa commented on SPARK-19442:
--

Thank you [~hyukjin.kwon]. 

It is satisfied my requirement. I could create a new column with blank values. 
:) 

> Unable to add column to the dataset using Dataset.WithColumn() api
> --
>
> Key: SPARK-19442
> URL: https://issues.apache.org/jira/browse/SPARK-19442
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When I'm creating a new column using Dataset.WithColumn() api, Analysis 
> Exception is thrown.
> Dataset.WithColumn() api: 
> Dataset.withColumn("newColumnName', new 
> org.apache.spark.sql.Column("newColumnName").cast("int"));
> Stacktrace: 
> cannot resolve '`NewColumn`' given input columns: [abc,xyz ]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api

2017-02-13 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865135#comment-15865135
 ] 

Navya Krishnappa commented on SPARK-19442:
--

If the source file has 3 columns

NameAge Address
Abc10   Bangalore
Xyz10  Bangalore

After adding new column say "State". Resultant dataset should be 
 
NameAge AddressState
Abc10   Bangalore
Xyz10  Bangalore



> Unable to add column to the dataset using Dataset.WithColumn() api
> --
>
> Key: SPARK-19442
> URL: https://issues.apache.org/jira/browse/SPARK-19442
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When I'm creating a new column using Dataset.WithColumn() api, Analysis 
> Exception is thrown.
> Dataset.WithColumn() api: 
> Dataset.withColumn("newColumnName', new 
> org.apache.spark.sql.Column("newColumnName").cast("int"));
> Stacktrace: 
> cannot resolve '`NewColumn`' given input columns: [abc,xyz ]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api

2017-02-13 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863216#comment-15863216
 ] 

Navya Krishnappa edited comment on SPARK-19442 at 2/13/17 5:29 AM:
---

Thank you for your response. 

I could able to derive a new column from the existing column. But my intention 
is to add a new column.


was (Author: navya krishnappa):
Thank you for your response. 

It is working as expected. I could able to add a new column to the data set.

> Unable to add column to the dataset using Dataset.WithColumn() api
> --
>
> Key: SPARK-19442
> URL: https://issues.apache.org/jira/browse/SPARK-19442
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When I'm creating a new column using Dataset.WithColumn() api, Analysis 
> Exception is thrown.
> Dataset.WithColumn() api: 
> Dataset.withColumn("newColumnName', new 
> org.apache.spark.sql.Column("newColumnName").cast("int"));
> Stacktrace: 
> cannot resolve '`NewColumn`' given input columns: [abc,xyz ]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api

2017-02-13 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa reopened SPARK-19442:
--

> Unable to add column to the dataset using Dataset.WithColumn() api
> --
>
> Key: SPARK-19442
> URL: https://issues.apache.org/jira/browse/SPARK-19442
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When I'm creating a new column using Dataset.WithColumn() api, Analysis 
> Exception is thrown.
> Dataset.WithColumn() api: 
> Dataset.withColumn("newColumnName', new 
> org.apache.spark.sql.Column("newColumnName").cast("int"));
> Stacktrace: 
> cannot resolve '`NewColumn`' given input columns: [abc,xyz ]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api

2017-02-13 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863216#comment-15863216
 ] 

Navya Krishnappa commented on SPARK-19442:
--

Thank you for your response. 

It is working as expected. I could able to add a new column to the data set.

> Unable to add column to the dataset using Dataset.WithColumn() api
> --
>
> Key: SPARK-19442
> URL: https://issues.apache.org/jira/browse/SPARK-19442
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When I'm creating a new column using Dataset.WithColumn() api, Analysis 
> Exception is thrown.
> Dataset.WithColumn() api: 
> Dataset.withColumn("newColumnName', new 
> org.apache.spark.sql.Column("newColumnName").cast("int"));
> Stacktrace: 
> cannot resolve '`NewColumn`' given input columns: [abc,xyz ]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api

2017-02-02 Thread Navya Krishnappa (JIRA)
Navya Krishnappa created SPARK-19442:


 Summary: Unable to add column to the dataset using 
Dataset.WithColumn() api
 Key: SPARK-19442
 URL: https://issues.apache.org/jira/browse/SPARK-19442
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 2.0.2
Reporter: Navya Krishnappa


When I'm creating a new column using Dataset.WithColumn() api, Analysis 
Exception is thrown.

Dataset.WithColumn() api: 

Dataset.withColumn("newColumnName', new 
org.apache.spark.sql.Column("newColumnName").cast("int"));


Stacktrace: 
cannot resolve '`NewColumn`' given input columns: [abc,xyz ]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18962) Unable to create parquet file for the given data

2016-12-21 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-18962:
-
Affects Version/s: 2.0.2

> Unable to create parquet file for the given data
> 
>
> Key: SPARK-18962
> URL: https://issues.apache.org/jira/browse/SPARK-18962
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When i'm trying to read the below mentioned csv source file and creating an 
> parquet file from that throws an java.lang.IllegalArgumentException: Invalid 
> DECIMAL scale: -9 exception.
> The source file content is 
> Row(column name)
> 9.03E+12
> 1.19E+11
> Refer the given code used read the csv file and creating an parquet file:
> //Read the csv file
> Dataset dataset = getSqlContext().read()
> .option(DAWBConstant.HEADER, "true")
> .option(DAWBConstant.PARSER_LIB, "commons")
> .option(DAWBConstant.INFER_SCHEMA, "true")
> .option(DAWBConstant.DELIMITER, ",")
> .option(DAWBConstant.QUOTE, "\"")
> .option(DAWBConstant.ESCAPE, "
> ")
> .option(DAWBConstant.MODE, Mode.PERMISSIVE)
> .csv(sourceFile)
> // create an parquet file
> dataset.write().parquet("//path.parquet")
> Stack trace:
> Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9
> at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
> at 
> org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410)
> at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324)
> at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250)
> at org.apache.parquet.schema.Types$Builder.named(Types.java:228)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at org.apache.spark.sql.types.StructType.map(StructType.scala:95)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:85)
> at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:288)
> at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18962) Unable to create parquet file for the given data

2016-12-21 Thread Navya Krishnappa (JIRA)
Navya Krishnappa created SPARK-18962:


 Summary: Unable to create parquet file for the given data
 Key: SPARK-18962
 URL: https://issues.apache.org/jira/browse/SPARK-18962
 Project: Spark
  Issue Type: Bug
Reporter: Navya Krishnappa


When i'm trying to read the below mentioned csv source file and creating an 
parquet file from that throws an java.lang.IllegalArgumentException: Invalid 
DECIMAL scale: -9 exception.
The source file content is 
Row(column name)
9.03E+12
1.19E+11
Refer the given code used read the csv file and creating an parquet file:
//Read the csv file
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.HEADER, "true")
.option(DAWBConstant.PARSER_LIB, "commons")
.option(DAWBConstant.INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "
")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile)
// create an parquet file
dataset.write().parquet("//path.parquet")
Stack trace:
Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410)
at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324)
at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250)
at org.apache.parquet.schema.Types$Builder.named(Types.java:228)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at org.apache.spark.sql.types.StructType.map(StructType.scala:95)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:85)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:288)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-18 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760487#comment-15760487
 ] 

Navya Krishnappa edited comment on SPARK-18877 at 12/19/16 7:56 AM:


Thank you [~dongjoon] and i will create an issue in Apace parquet.


was (Author: navya krishnappa):
Thank you [~dongjoon]

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-18 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760487#comment-15760487
 ] 

Navya Krishnappa edited comment on SPARK-18877 at 12/19/16 7:56 AM:


Thank you [~dongjoon] and i will create an issue in  Apache Parquet JIRA.


was (Author: navya krishnappa):
Thank you [~dongjoon] and i will create an issue in Apace parquet.

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-18 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760487#comment-15760487
 ] 

Navya Krishnappa commented on SPARK-18877:
--

Thank you [~dongjoon]

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-16 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756554#comment-15756554
 ] 

Navya Krishnappa commented on SPARK-18877:
--

Thank you for replying [~dongjoon]. Can you help me in understanding whether 
the above mentioned PR will resolve the below mentioned issue.

I have another issue with respect to the decimal scale. When i'm trying to read 
the below mentioned csv source file and creating an parquet file from that 
throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 
exception.


The source file content is 
Row(column name)
9.03E+12
1.19E+11

 Refer the given code used read the csv file and creating an parquet file:

//Read the csv file
Dataset dataset = getSqlContext().read()
.option(DAWBConstant.HEADER, "true")
.option(DAWBConstant.PARSER_LIB, "commons")
.option(DAWBConstant.INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "
")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile)

// create an parquet file
dataset.write().parquet("//path.parquet")


Stack trace:

Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324)
at 
org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250)
at org.apache.parquet.schema.Types$Builder.named(Types.java:228)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at org.apache.spark.sql.types.StructType.map(StructType.scala:95)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:85)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:288)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)


> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following e

[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-16 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754865#comment-15754865
 ] 

Navya Krishnappa commented on SPARK-18877:
--

I'm using SparkContext.read() to read the content. Refer the given code using 
to read the csv file.

Dataset dataset = getSqlContext().read()
.option(DAWBConstant.HEADER, "true")
.option(DAWBConstant.PARSER_LIB, "commons")
.option(DAWBConstant.INFER_SCHEMA, "true")
.option(DAWBConstant.DELIMITER, ",")
.option(DAWBConstant.QUOTE, "\"")
.option(DAWBConstant.ESCAPE, "\\")
.option(DAWBConstant.MODE, Mode.PERMISSIVE)
.csv(sourceFile);

if we collect the dataset (dataset.collect()). i will get 
java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
exceeds max precision 20 exception.

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-15 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753728#comment-15753728
 ] 

Navya Krishnappa commented on SPARK-18877:
--

Precision and scale vary depending on the decimal values in the column. Suppose 
if source file contains 

Amount(column name)
9.03E+12
1.19E+11
24335739714
1.71E+11

then spark consider Amount column as decimal(3,-9). and throws an below 
mentioned exception

Caused by: java.lang.IllegalArgumentException: requirement failed: Decimal 
precision 4 exceeds max precision 3
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:112)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:425)
at 
org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:264)
at 
org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:116)



> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-15 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753716#comment-15753716
 ] 

Navya Krishnappa commented on SPARK-18877:
--

I'm reading through csvReader (.csv(sourceFile)) and i'm not setting any 
precision and scale, Spark is automatically detecting the precision and scale 
for the values in the source file. And precision and scale vary depending on 
the decimal values in the column. 

Stack trace:

Caused by: java.lang.IllegalArgumentException: requirement failed: Decimal 
precision 28 exceeds max precision 20
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:112)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:425)
at 
org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:264)
at 
org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:116)
at 
org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:85)
at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:128)
at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:127)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:128)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 common frames omitted

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-15 Thread Navya Krishnappa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navya Krishnappa updated SPARK-18877:
-
Affects Version/s: 2.0.2

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-14 Thread Navya Krishnappa (JIRA)
Navya Krishnappa created SPARK-18877:


 Summary: Unable to read given csv data. Excepion: 
java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
exceeds max precision 20
 Key: SPARK-18877
 URL: https://issues.apache.org/jira/browse/SPARK-18877
 Project: Spark
  Issue Type: Bug
Reporter: Navya Krishnappa


When reading below mentioned csv data, even though the maximum decimal 
precision is 38, following exception is thrown 
java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
exceeds max precision 20


Decimal
2323366225312000
2433573971400
23233662253000
23233662253





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org