[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)
[ https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238971#comment-17238971 ] Dongjoon Hyun commented on SPARK-26645: --- This landed `branch-2.4` via https://github.com/apache/spark/pull/30503 . > CSV infer schema bug infers decimal(9,-1) > - > > Key: SPARK-26645 > URL: https://issues.apache.org/jira/browse/SPARK-26645 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.7 >Reporter: Ohad Raviv >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.8, 3.0.0 > > > we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9". > running: > {code:python} > df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t') > print df.dtypes > {code} > causes: > {noformat} > ValueError: Could not parse datatype: decimal(9,-1) > {noformat} > I'm not sure where the bug is - inferSchema or dtypes? > I saw it is legal to have a decimal with negative scale in the code > (CSVInferSchema.scala): > {code:python} > if (bigDecimal.scale <= 0) { > // `DecimalType` conversion can fail when > // 1. The precision is bigger than 38. > // 2. scale is bigger than precision. > DecimalType(bigDecimal.precision, bigDecimal.scale) > } > {code} > but what does it mean? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)
[ https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238878#comment-17238878 ] Apache Spark commented on SPARK-26645: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30503 > CSV infer schema bug infers decimal(9,-1) > - > > Key: SPARK-26645 > URL: https://issues.apache.org/jira/browse/SPARK-26645 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ohad Raviv >Assignee: Marco Gaido >Priority: Minor > Fix For: 3.0.0 > > > we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9". > running: > {code:python} > df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t') > print df.dtypes > {code} > causes: > {noformat} > ValueError: Could not parse datatype: decimal(9,-1) > {noformat} > I'm not sure where the bug is - inferSchema or dtypes? > I saw it is legal to have a decimal with negative scale in the code > (CSVInferSchema.scala): > {code:python} > if (bigDecimal.scale <= 0) { > // `DecimalType` conversion can fail when > // 1. The precision is bigger than 38. > // 2. scale is bigger than precision. > DecimalType(bigDecimal.precision, bigDecimal.scale) > } > {code} > but what does it mean? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)
[ https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238876#comment-17238876 ] Dongjoon Hyun commented on SPARK-26645: --- Okay. Let me make a backporting PR to branch-2.4, [~bullsoverbears]. > CSV infer schema bug infers decimal(9,-1) > - > > Key: SPARK-26645 > URL: https://issues.apache.org/jira/browse/SPARK-26645 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ohad Raviv >Assignee: Marco Gaido >Priority: Minor > Fix For: 3.0.0 > > > we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9". > running: > {code:python} > df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t') > print df.dtypes > {code} > causes: > {noformat} > ValueError: Could not parse datatype: decimal(9,-1) > {noformat} > I'm not sure where the bug is - inferSchema or dtypes? > I saw it is legal to have a decimal with negative scale in the code > (CSVInferSchema.scala): > {code:python} > if (bigDecimal.scale <= 0) { > // `DecimalType` conversion can fail when > // 1. The precision is bigger than 38. > // 2. scale is bigger than precision. > DecimalType(bigDecimal.precision, bigDecimal.scale) > } > {code} > but what does it mean? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)
[ https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238798#comment-17238798 ] Punit Shah commented on SPARK-26645: Hello [~dongjoon] If we can get this PR then this would be tremendously helpful. > CSV infer schema bug infers decimal(9,-1) > - > > Key: SPARK-26645 > URL: https://issues.apache.org/jira/browse/SPARK-26645 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ohad Raviv >Assignee: Marco Gaido >Priority: Minor > Fix For: 3.0.0 > > > we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9". > running: > {code:python} > df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t') > print df.dtypes > {code} > causes: > {noformat} > ValueError: Could not parse datatype: decimal(9,-1) > {noformat} > I'm not sure where the bug is - inferSchema or dtypes? > I saw it is legal to have a decimal with negative scale in the code > (CSVInferSchema.scala): > {code:python} > if (bigDecimal.scale <= 0) { > // `DecimalType` conversion can fail when > // 1. The precision is bigger than 38. > // 2. scale is bigger than precision. > DecimalType(bigDecimal.precision, bigDecimal.scale) > } > {code} > but what does it mean? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)
[ https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235757#comment-17235757 ] Dongjoon Hyun commented on SPARK-26645: --- Do you need this, [~bullsoverbears]? > CSV infer schema bug infers decimal(9,-1) > - > > Key: SPARK-26645 > URL: https://issues.apache.org/jira/browse/SPARK-26645 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ohad Raviv >Assignee: Marco Gaido >Priority: Minor > Fix For: 3.0.0 > > > we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9". > running: > {code:python} > df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t') > print df.dtypes > {code} > causes: > {noformat} > ValueError: Could not parse datatype: decimal(9,-1) > {noformat} > I'm not sure where the bug is - inferSchema or dtypes? > I saw it is legal to have a decimal with negative scale in the code > (CSVInferSchema.scala): > {code:python} > if (bigDecimal.scale <= 0) { > // `DecimalType` conversion can fail when > // 1. The precision is bigger than 38. > // 2. scale is bigger than precision. > DecimalType(bigDecimal.precision, bigDecimal.scale) > } > {code} > but what does it mean? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)
[ https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745146#comment-16745146 ] Marco Gaido commented on SPARK-26645: - The error is on python side, I will submit a PR shortly, thanks for reporting this. > CSV infer schema bug infers decimal(9,-1) > - > > Key: SPARK-26645 > URL: https://issues.apache.org/jira/browse/SPARK-26645 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ohad Raviv >Priority: Minor > > we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9". > running: > {code:python} > df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t') > print df.dtypes > {code} > causes: > {noformat} > ValueError: Could not parse datatype: decimal(9,-1) > {noformat} > I'm not sure where the bug is - inferSchema or dtypes? > I saw it is legal to have a decimal with negative scale in the code > (CSVInferSchema.scala): > {code:python} > if (bigDecimal.scale <= 0) { > // `DecimalType` conversion can fail when > // 1. The precision is bigger than 38. > // 2. scale is bigger than precision. > DecimalType(bigDecimal.precision, bigDecimal.scale) > } > {code} > but what does it mean? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org