[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245293#comment-17245293 ] Sean R. Owen commented on SPARK-33395: -- You can't have a column with varying type. Reading this as a Decimal type is how you'd have to do this. You can of course assess and change the precision in a UDF you write. > Spark reading data in scientific notation > - > > Key: SPARK-33395 > URL: https://issues.apache.org/jira/browse/SPARK-33395 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.4, 2.4.8, 3.1.0 >Reporter: Nilesh Patil >Priority: Major > > File is having below data > DAta > 1200404151072.1211 > 1200404151073 > 1200404151074.1232323 > 1200404151075.124344 > 1200404151076.12 > 1200404151077.12343 > 1200404151078.12 > 1200404151079.12544545454554 > 1251080.123444 > 1 > > Spark is reading with scientific notation as we wanted to read data as it is > available in file with accurate datatype not with string datatype. > ++ > | DAta| > ++ > |1.200404151072121E12| > | 1.200404151073E12| > |1.200404151074123...| > |1.200404151075124...| > | 1.20040415107612E12| > |1.200404151077123...| > | 1.20040415107812E12| > |1.200404151079125...| > | 1251080.123445| > | 1.0E28| > + > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233279#comment-17233279 ] Nilesh Patil commented on SPARK-33395: -- In decimal type scale and precision will be constant for all rows, where as all row may vary its scale and precision. > Spark reading data in scientific notation > - > > Key: SPARK-33395 > URL: https://issues.apache.org/jira/browse/SPARK-33395 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.4, 2.4.8, 3.1.0 >Reporter: Nilesh Patil >Priority: Major > > File is having below data > DAta > 1200404151072.1211 > 1200404151073 > 1200404151074.1232323 > 1200404151075.124344 > 1200404151076.12 > 1200404151077.12343 > 1200404151078.12 > 1200404151079.12544545454554 > 1251080.123444 > 1 > > Spark is reading with scientific notation as we wanted to read data as it is > available in file with accurate datatype not with string datatype. > ++ > | DAta| > ++ > |1.200404151072121E12| > | 1.200404151073E12| > |1.200404151074123...| > |1.200404151075124...| > | 1.20040415107612E12| > |1.200404151077123...| > | 1.20040415107812E12| > |1.200404151079125...| > | 1251080.123445| > | 1.0E28| > + > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233276#comment-17233276 ] Takeshi Yamamuro commented on SPARK-33395: -- How about using a decimal type instead? > Spark reading data in scientific notation > - > > Key: SPARK-33395 > URL: https://issues.apache.org/jira/browse/SPARK-33395 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.4, 2.4.8, 3.1.0 >Reporter: Nilesh Patil >Priority: Major > > File is having below data > DAta > 1200404151072.1211 > 1200404151073 > 1200404151074.1232323 > 1200404151075.124344 > 1200404151076.12 > 1200404151077.12343 > 1200404151078.12 > 1200404151079.12544545454554 > 1251080.123444 > 1 > > Spark is reading with scientific notation as we wanted to read data as it is > available in file with accurate datatype not with string datatype. > ++ > | DAta| > ++ > |1.200404151072121E12| > | 1.200404151073E12| > |1.200404151074123...| > |1.200404151075124...| > | 1.20040415107612E12| > |1.200404151077123...| > | 1.20040415107812E12| > |1.200404151079125...| > | 1251080.123445| > | 1.0E28| > + > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233271#comment-17233271 ] Nilesh Patil commented on SPARK-33395: -- Can we have any method implementation which will persist the datatype but and will display the data in original form ? Something like this. As this is issue is reported by our 3 clients on production so wanted to take it on high priority. > Spark reading data in scientific notation > - > > Key: SPARK-33395 > URL: https://issues.apache.org/jira/browse/SPARK-33395 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.4, 2.4.8, 3.1.0 >Reporter: Nilesh Patil >Priority: Major > > File is having below data > DAta > 1200404151072.1211 > 1200404151073 > 1200404151074.1232323 > 1200404151075.124344 > 1200404151076.12 > 1200404151077.12343 > 1200404151078.12 > 1200404151079.12544545454554 > 1251080.123444 > 1 > > Spark is reading with scientific notation as we wanted to read data as it is > available in file with accurate datatype not with string datatype. > ++ > | DAta| > ++ > |1.200404151072121E12| > | 1.200404151073E12| > |1.200404151074123...| > |1.200404151075124...| > | 1.20040415107612E12| > |1.200404151077123...| > | 1.20040415107812E12| > |1.200404151079125...| > | 1251080.123445| > | 1.0E28| > + > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233267#comment-17233267 ] Takeshi Yamamuro commented on SPARK-33395: -- hm, but we cannot avoid the rounding in this case, I think. Any idea? Either way, I think this is an expected behaviour, so I will change "Bug" -> "Improvement" now. > Spark reading data in scientific notation > - > > Key: SPARK-33395 > URL: https://issues.apache.org/jira/browse/SPARK-33395 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.8, 3.1.0 >Reporter: Nilesh Patil >Priority: Critical > > File is having below data > DAta > 1200404151072.1211 > 1200404151073 > 1200404151074.1232323 > 1200404151075.124344 > 1200404151076.12 > 1200404151077.12343 > 1200404151078.12 > 1200404151079.12544545454554 > 1251080.123444 > 1 > > Spark is reading with scientific notation as we wanted to read data as it is > available in file with accurate datatype not with string datatype. > ++ > | DAta| > ++ > |1.200404151072121E12| > | 1.200404151073E12| > |1.200404151074123...| > |1.200404151075124...| > | 1.20040415107612E12| > |1.200404151077123...| > | 1.20040415107812E12| > |1.200404151079125...| > | 1251080.123445| > | 1.0E28| > + > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233264#comment-17233264 ] Nilesh Patil commented on SPARK-33395: -- Yes inferred type is double, but we want the data as it is as it have in file not in the scientific notation also not in the form rounding. > Spark reading data in scientific notation > - > > Key: SPARK-33395 > URL: https://issues.apache.org/jira/browse/SPARK-33395 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.8, 3.1.0 >Reporter: Nilesh Patil >Priority: Critical > > File is having below data > DAta > 1200404151072.1211 > 1200404151073 > 1200404151074.1232323 > 1200404151075.124344 > 1200404151076.12 > 1200404151077.12343 > 1200404151078.12 > 1200404151079.12544545454554 > 1251080.123444 > 1 > > Spark is reading with scientific notation as we wanted to read data as it is > available in file with accurate datatype not with string datatype. > ++ > | DAta| > ++ > |1.200404151072121E12| > | 1.200404151073E12| > |1.200404151074123...| > |1.200404151075124...| > | 1.20040415107612E12| > |1.200404151077123...| > | 1.20040415107812E12| > |1.200404151079125...| > | 1251080.123445| > | 1.0E28| > + > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233263#comment-17233263 ] Takeshi Yamamuro commented on SPARK-33395: -- The inferred type is double, so they are approximate values. What do you suggest here? You think we should use decimal in this case instead? > Spark reading data in scientific notation > - > > Key: SPARK-33395 > URL: https://issues.apache.org/jira/browse/SPARK-33395 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.4 >Reporter: Nilesh Patil >Priority: Critical > > File is having below data > DAta > 1200404151072.1211 > 1200404151073 > 1200404151074.1232323 > 1200404151075.124344 > 1200404151076.12 > 1200404151077.12343 > 1200404151078.12 > 1200404151079.12544545454554 > 1251080.123444 > 1 > > Spark is reading with scientific notation as we wanted to read data as it is > available in file with accurate datatype not with string datatype. > ++ > | DAta| > ++ > |1.200404151072121E12| > | 1.200404151073E12| > |1.200404151074123...| > |1.200404151075124...| > | 1.20040415107612E12| > |1.200404151077123...| > | 1.20040415107812E12| > |1.200404151079125...| > | 1251080.123445| > | 1.0E28| > + > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233249#comment-17233249 ] Nilesh Patil commented on SPARK-33395: -- [~zhangway], Hi, I am expecting out like below. DAta 1200404151072.1211 1200404151073 1200404151074.1232323 1200404151075.124344 1200404151076.12 1200404151077.12343 1200404151078.12 1200404151079.12544545454554 1251080.123444 1 Code we are using dataset = sparkSession.option("header",true).option("multiLine", true) .option("inferSchema",true) .csv(filePathSeq); > Spark reading data in scientific notation > - > > Key: SPARK-33395 > URL: https://issues.apache.org/jira/browse/SPARK-33395 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.4 >Reporter: Nilesh Patil >Priority: Critical > > File is having below data > DAta > 1200404151072.1211 > 1200404151073 > 1200404151074.1232323 > 1200404151075.124344 > 1200404151076.12 > 1200404151077.12343 > 1200404151078.12 > 1200404151079.12544545454554 > 1251080.123444 > 1 > > Spark is reading with scientific notation as we wanted to read data as it is > available in file with accurate datatype not with string datatype. > ++ > | DAta| > ++ > |1.200404151072121E12| > | 1.200404151073E12| > |1.200404151074123...| > |1.200404151075124...| > | 1.20040415107612E12| > |1.200404151077123...| > | 1.20040415107812E12| > |1.200404151079125...| > | 1251080.123445| > | 1.0E28| > + > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33395) Spark reading data in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-33395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231928#comment-17231928 ] Wei Zhang commented on SPARK-33395: --- Hi, i want to know what do you expect, it is like this: |1.200404151072121E12| |1.200404151073E12| |1.200404151074123...| |1.200404151075124...| |1.20040415107612E12| |1.200404151077123...| |1.20040415107812E12| |1.200404151079125...| |1251080.123445| |1.0E28| can you give the code that you read the data in the file? > Spark reading data in scientific notation > - > > Key: SPARK-33395 > URL: https://issues.apache.org/jira/browse/SPARK-33395 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.4 >Reporter: Nilesh Patil >Priority: Critical > > File is having below data > DAta > 1200404151072.1211 > 1200404151073 > 1200404151074.1232323 > 1200404151075.124344 > 1200404151076.12 > 1200404151077.12343 > 1200404151078.12 > 1200404151079.12544545454554 > 1251080.123444 > 1 > > Spark is reading with scientific notation as we wanted to read data as it is > available in file with accurate datatype not with string datatype. > ++ > | DAta| > ++ > |1.200404151072121E12| > | 1.200404151073E12| > |1.200404151074123...| > |1.200404151075124...| > | 1.20040415107612E12| > |1.200404151077123...| > | 1.20040415107812E12| > |1.200404151079125...| > | 1251080.123445| > | 1.0E28| > + > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org