sivakumardg opened a new issue, #56738:
URL: https://github.com/apache/spark/issues/56738

   ## Problem Statement
    
   When reading Oracle NUMBER columns (without explicit precision/scale) via 
JDBC, Spark's OracleDialect maps them to DecimalType(38, 10) with a hardcoded 
scale of 10. This causes silent data precision loss for values that require 
more than 10 decimal places, with no warning or error to the user.
    
   ## Current Behavior
    
   Oracle bare NUMBER (no precision/scale defined) returns via JDBC metadata:
     - getPrecision() = 0
     - getScale()     = -127
    
   Spark's OracleDialect.scala handles this with a hardcoded scale of 10:
    
       case 0 => Option(DecimalType(DecimalType.MAX_PRECISION, 10))
    
   So any bare NUMBER column becomes Decimal(38, 10) in Spark — regardless of 
the actual data stored in the column.
    
   ## Steps to Reproduce
    
   Oracle setup:
    
       CREATE TABLE test_number (
         col1 NUMBER,  -- stores small decimal values
         col2 NUMBER   -- stores large integer values
       );
    
       INSERT INTO test_number VALUES (0.0000123456789123456, 
12345678901234567890);
    
   PySpark read:
    
       df = spark.read.format("jdbc") \
           .option("url", oracle_url) \
           .option("dbtable", "test_number") \
           .option("driver", "oracle.jdbc.driver.OracleDriver") \
           .load()
    
       df.printSchema()
       # root
       #  |-- col1: decimal(38,10)
       #  |-- col2: decimal(38,10)
    
       df.show(truncate=False)
       # col1: 0.0000123457           <- SILENT DATA LOSS!
       # original value: 0.0000123456789123456
       # col2: 12345678901234567890   <- OK
    
   ## Expected Behavior
    
   Spark should not silently truncate data. When the scale of a bare NUMBER 
column cannot be determined from JDBC metadata, the user should at minimum be 
warned that precision loss may occur, or provided a way to configure the 
fallback behavior.
    
   ## Why This is Critical
    
     - The scale of bare NUMBER varies per column in Oracle—hardcoding 10 
cannot be correct for all cases
     - The precision loss is completely silent—no warning, no error, no 
indication in the schema that data has been truncated
     - NUMBER(38, 25) works correctly -> Decimal(38, 25) in Spark, proving the 
issue is only with bare NUMBER
     - There is currently no global workaround across multiple tables without 
knowing all column names upfront
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to