[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-12 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360779#comment-16360779
 ] 

Sean Owen commented on SPARK-23370:
---

OK, that doesn't sound trivial. It seems like this would non-trivially harm all 
uses of Oracle to work around a rare bug already fixed in versions of Oracle's 
driver. I am not sure Spark should do this.

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-11 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360257#comment-16360257
 ] 

Harleen Singh Mann commented on SPARK-23370:


# Yes, querying the table would mean non-trivial performance impact
 # It works for all tables that the jdbc user has access to. For more 
information refer to 
[https://docs.oracle.com/cd/B19306_01/server.102/b14237/statviews_2094.htm]

This is very similar to the INFORMATION_SCHEMA.COLUMNS table in MySQL.

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-11 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359957#comment-16359957
 ] 

Sean Owen commented on SPARK-23370:
---

I don't think that answers the question, which is probably one you're 
positioned to answer if you've evaluated this.

Now, you would make all Oracle queries perform an extra table query to look up 
schema. That's going to have a non-trivial performance impact, right? does it 
always work, even?

These are the questions you'd need to address.

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-10 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359788#comment-16359788
 ] 

Harleen Singh Mann commented on SPARK-23370:


This goes as far as I understand: 
 * JDBC driver: Once we create the result set object using the jdbc driver, it 
will contain all the actual data as well as the metadata for the concerned DB 
table. 
 * Query additional table (all_tab_rows): This would entail creating another 
result set that will capture the metadata for the concerned DB table as data 
(rows). Overhead:
 ** Connection: None. Since it will use pooling
 ** Retrieving result: Low impact. Since we will push down the predicate to the 
DB to filter data only for the concerned table

I believe that "all_tab_rows" table should be queried on the driver and 
broadcast to the executors. Does this make sense?

Can we get some inputs from someone else as well?

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-10 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359500#comment-16359500
 ] 

Sean Owen commented on SPARK-23370:
---

Overhead for the applications that will use Oracle from Spark. You're proposing 
making all Oracle connections query a table for schema instead of getting it 
the usual way from the JDBC driver. What's the downside?

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359200#comment-16359200
 ] 

Harleen Singh Mann commented on SPARK-23370:


[~srowen] Yes should be able to implement in the Oracle JDBC dialect. I want to 
start working on it once we agree it adds value.

Do you mean overhead for Spark? Or for the Oracle DB? Or for the developer? haha

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359047#comment-16359047
 ] 

Sean Owen commented on SPARK-23370:
---

It's possible to implement that just in the JDBC dialect for Oracle I suppose. 
Is it extra overhead? that is I wonder about leaving in the workaround that 
impacts all Oracle users for a long time.

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358495#comment-16358495
 ] 

Harleen Singh Mann commented on SPARK-23370:


[~q79969786] your suggestion would work but only if one knows in advance that 
there exists a column in Oracle DB of type Numeric and created using alter 
table statement. This information is seldom available to developers.

[~srowen] True, it is an Oracle issue. If everyone agrees that Spark has 
nothing to do with it we may close this issue as is.

However, I feel there may be merit in evaluating the way spark is fetching 
schema information from jdbc - i.e. resultSet.getMetaData.getColumnType VS from 
all_tabs_columns

 

Thanks.

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358401#comment-16358401
 ] 

Yuming Wang commented on SPARK-23370:
-

User can config the column type like below now:
{code:scala}
val props = new Properties()
props.put("customSchema", "ID decimal(38, 0), N1 int, N2 boolean")
val dfRead = spark.read.schema(schema).jdbc(jdbcUrl, "tableWithCustomSchema", 
props)
dfRead.show()
{code}
More details:
https://github.com/apache/spark/pull/18266

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Major
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org