[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation

2020-12-07 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245293#comment-17245293
 ] 

Sean R. Owen commented on SPARK-33395:
--

You can't have a column with varying type. Reading this as a Decimal type is 
how you'd have to do this. You can of course assess and change the precision in 
a UDF you write.

> Spark reading data in scientific notation
> -
>
> Key: SPARK-33395
> URL: https://issues.apache.org/jira/browse/SPARK-33395
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.8, 3.1.0
>Reporter: Nilesh Patil
>Priority: Major
>
> File is having below data
> DAta
> 1200404151072.1211
> 1200404151073
> 1200404151074.1232323
> 1200404151075.124344
> 1200404151076.12
> 1200404151077.12343
> 1200404151078.12
> 1200404151079.12544545454554
> 1251080.123444
> 1
>  
> Spark is reading with scientific notation as we wanted to read data as it is 
> available in file with accurate datatype not with string datatype.
> ++
> | DAta|
> ++
> |1.200404151072121E12|
> | 1.200404151073E12|
> |1.200404151074123...|
> |1.200404151075124...|
> | 1.20040415107612E12|
> |1.200404151077123...|
> | 1.20040415107812E12|
> |1.200404151079125...|
> | 1251080.123445|
> | 1.0E28|
> +
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation

2020-11-16 Thread Nilesh Patil (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233279#comment-17233279
 ] 

Nilesh Patil commented on SPARK-33395:
--

In decimal type scale and precision will be constant for all rows, where as all 
row may vary its scale and precision.

> Spark reading data in scientific notation
> -
>
> Key: SPARK-33395
> URL: https://issues.apache.org/jira/browse/SPARK-33395
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.8, 3.1.0
>Reporter: Nilesh Patil
>Priority: Major
>
> File is having below data
> DAta
> 1200404151072.1211
> 1200404151073
> 1200404151074.1232323
> 1200404151075.124344
> 1200404151076.12
> 1200404151077.12343
> 1200404151078.12
> 1200404151079.12544545454554
> 1251080.123444
> 1
>  
> Spark is reading with scientific notation as we wanted to read data as it is 
> available in file with accurate datatype not with string datatype.
> ++
> | DAta|
> ++
> |1.200404151072121E12|
> | 1.200404151073E12|
> |1.200404151074123...|
> |1.200404151075124...|
> | 1.20040415107612E12|
> |1.200404151077123...|
> | 1.20040415107812E12|
> |1.200404151079125...|
> | 1251080.123445|
> | 1.0E28|
> +
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation

2020-11-16 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233276#comment-17233276
 ] 

Takeshi Yamamuro commented on SPARK-33395:
--

How about using a decimal type instead? 

> Spark reading data in scientific notation
> -
>
> Key: SPARK-33395
> URL: https://issues.apache.org/jira/browse/SPARK-33395
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.8, 3.1.0
>Reporter: Nilesh Patil
>Priority: Major
>
> File is having below data
> DAta
> 1200404151072.1211
> 1200404151073
> 1200404151074.1232323
> 1200404151075.124344
> 1200404151076.12
> 1200404151077.12343
> 1200404151078.12
> 1200404151079.12544545454554
> 1251080.123444
> 1
>  
> Spark is reading with scientific notation as we wanted to read data as it is 
> available in file with accurate datatype not with string datatype.
> ++
> | DAta|
> ++
> |1.200404151072121E12|
> | 1.200404151073E12|
> |1.200404151074123...|
> |1.200404151075124...|
> | 1.20040415107612E12|
> |1.200404151077123...|
> | 1.20040415107812E12|
> |1.200404151079125...|
> | 1251080.123445|
> | 1.0E28|
> +
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation

2020-11-16 Thread Nilesh Patil (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233271#comment-17233271
 ] 

Nilesh Patil commented on SPARK-33395:
--

Can we have any method implementation which will persist the datatype but and 
will display the data in original form ? Something like this. As this is issue 
is reported by our 3 clients on production so wanted to take it on high 
priority. 

> Spark reading data in scientific notation
> -
>
> Key: SPARK-33395
> URL: https://issues.apache.org/jira/browse/SPARK-33395
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.8, 3.1.0
>Reporter: Nilesh Patil
>Priority: Major
>
> File is having below data
> DAta
> 1200404151072.1211
> 1200404151073
> 1200404151074.1232323
> 1200404151075.124344
> 1200404151076.12
> 1200404151077.12343
> 1200404151078.12
> 1200404151079.12544545454554
> 1251080.123444
> 1
>  
> Spark is reading with scientific notation as we wanted to read data as it is 
> available in file with accurate datatype not with string datatype.
> ++
> | DAta|
> ++
> |1.200404151072121E12|
> | 1.200404151073E12|
> |1.200404151074123...|
> |1.200404151075124...|
> | 1.20040415107612E12|
> |1.200404151077123...|
> | 1.20040415107812E12|
> |1.200404151079125...|
> | 1251080.123445|
> | 1.0E28|
> +
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation

2020-11-16 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233267#comment-17233267
 ] 

Takeshi Yamamuro commented on SPARK-33395:
--

hm, but we cannot avoid the rounding in this case, I think. Any idea? Either 
way, I think this is an expected behaviour, so I will change "Bug" -> 
"Improvement" now.

> Spark reading data in scientific notation
> -
>
> Key: SPARK-33395
> URL: https://issues.apache.org/jira/browse/SPARK-33395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.8, 3.1.0
>Reporter: Nilesh Patil
>Priority: Critical
>
> File is having below data
> DAta
> 1200404151072.1211
> 1200404151073
> 1200404151074.1232323
> 1200404151075.124344
> 1200404151076.12
> 1200404151077.12343
> 1200404151078.12
> 1200404151079.12544545454554
> 1251080.123444
> 1
>  
> Spark is reading with scientific notation as we wanted to read data as it is 
> available in file with accurate datatype not with string datatype.
> ++
> | DAta|
> ++
> |1.200404151072121E12|
> | 1.200404151073E12|
> |1.200404151074123...|
> |1.200404151075124...|
> | 1.20040415107612E12|
> |1.200404151077123...|
> | 1.20040415107812E12|
> |1.200404151079125...|
> | 1251080.123445|
> | 1.0E28|
> +
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation

2020-11-16 Thread Nilesh Patil (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233264#comment-17233264
 ] 

Nilesh Patil commented on SPARK-33395:
--

Yes inferred type is double, but we want the data as it is as it have in file 
not in the scientific notation also not in the form rounding.

> Spark reading data in scientific notation
> -
>
> Key: SPARK-33395
> URL: https://issues.apache.org/jira/browse/SPARK-33395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.8, 3.1.0
>Reporter: Nilesh Patil
>Priority: Critical
>
> File is having below data
> DAta
> 1200404151072.1211
> 1200404151073
> 1200404151074.1232323
> 1200404151075.124344
> 1200404151076.12
> 1200404151077.12343
> 1200404151078.12
> 1200404151079.12544545454554
> 1251080.123444
> 1
>  
> Spark is reading with scientific notation as we wanted to read data as it is 
> available in file with accurate datatype not with string datatype.
> ++
> | DAta|
> ++
> |1.200404151072121E12|
> | 1.200404151073E12|
> |1.200404151074123...|
> |1.200404151075124...|
> | 1.20040415107612E12|
> |1.200404151077123...|
> | 1.20040415107812E12|
> |1.200404151079125...|
> | 1251080.123445|
> | 1.0E28|
> +
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation

2020-11-16 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233263#comment-17233263
 ] 

Takeshi Yamamuro commented on SPARK-33395:
--

The inferred type is double, so they are approximate values. What do you 
suggest here? You think we should use decimal in this case instead?

> Spark reading data in scientific notation
> -
>
> Key: SPARK-33395
> URL: https://issues.apache.org/jira/browse/SPARK-33395
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4
>Reporter: Nilesh Patil
>Priority: Critical
>
> File is having below data
> DAta
> 1200404151072.1211
> 1200404151073
> 1200404151074.1232323
> 1200404151075.124344
> 1200404151076.12
> 1200404151077.12343
> 1200404151078.12
> 1200404151079.12544545454554
> 1251080.123444
> 1
>  
> Spark is reading with scientific notation as we wanted to read data as it is 
> available in file with accurate datatype not with string datatype.
> ++
> | DAta|
> ++
> |1.200404151072121E12|
> | 1.200404151073E12|
> |1.200404151074123...|
> |1.200404151075124...|
> | 1.20040415107612E12|
> |1.200404151077123...|
> | 1.20040415107812E12|
> |1.200404151079125...|
> | 1251080.123445|
> | 1.0E28|
> +
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation

2020-11-16 Thread Nilesh Patil (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233249#comment-17233249
 ] 

Nilesh Patil commented on SPARK-33395:
--

[~zhangway], 

Hi, I am expecting out like below. 

DAta
1200404151072.1211
1200404151073
1200404151074.1232323
1200404151075.124344
1200404151076.12
1200404151077.12343
1200404151078.12
1200404151079.12544545454554
1251080.123444
1

 

Code we are using 

dataset = sparkSession.option("header",true).option("multiLine", true) 
.option("inferSchema",true) .csv(filePathSeq);

 

> Spark reading data in scientific notation
> -
>
> Key: SPARK-33395
> URL: https://issues.apache.org/jira/browse/SPARK-33395
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4
>Reporter: Nilesh Patil
>Priority: Critical
>
> File is having below data
> DAta
> 1200404151072.1211
> 1200404151073
> 1200404151074.1232323
> 1200404151075.124344
> 1200404151076.12
> 1200404151077.12343
> 1200404151078.12
> 1200404151079.12544545454554
> 1251080.123444
> 1
>  
> Spark is reading with scientific notation as we wanted to read data as it is 
> available in file with accurate datatype not with string datatype.
> ++
> | DAta|
> ++
> |1.200404151072121E12|
> | 1.200404151073E12|
> |1.200404151074123...|
> |1.200404151075124...|
> | 1.20040415107612E12|
> |1.200404151077123...|
> | 1.20040415107812E12|
> |1.200404151079125...|
> | 1251080.123445|
> | 1.0E28|
> +
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation

2020-11-13 Thread Wei Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231928#comment-17231928
 ] 

Wei Zhang commented on SPARK-33395:
---

Hi,  i  want to know  what do you expect, it is like this:

 
|1.200404151072121E12|
|1.200404151073E12|
|1.200404151074123...|
|1.200404151075124...|
|1.20040415107612E12|
|1.200404151077123...|
|1.20040415107812E12|
|1.200404151079125...|
|1251080.123445|
|1.0E28|

 

can you give the code that you read  the data in the file?

> Spark reading data in scientific notation
> -
>
> Key: SPARK-33395
> URL: https://issues.apache.org/jira/browse/SPARK-33395
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4
>Reporter: Nilesh Patil
>Priority: Critical
>
> File is having below data
> DAta
> 1200404151072.1211
> 1200404151073
> 1200404151074.1232323
> 1200404151075.124344
> 1200404151076.12
> 1200404151077.12343
> 1200404151078.12
> 1200404151079.12544545454554
> 1251080.123444
> 1
>  
> Spark is reading with scientific notation as we wanted to read data as it is 
> available in file with accurate datatype not with string datatype.
> ++
> | DAta|
> ++
> |1.200404151072121E12|
> | 1.200404151073E12|
> |1.200404151074123...|
> |1.200404151075124...|
> | 1.20040415107612E12|
> |1.200404151077123...|
> | 1.20040415107812E12|
> |1.200404151079125...|
> | 1251080.123445|
> | 1.0E28|
> +
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org