MaxGekk opened a new pull request #31552:
URL: https://github.com/apache/spark/pull/31552


   ### What changes were proposed in this pull request?
   In the PR, I propose to modify `RandomDataGenerator.forType()` to generate 
only dates/timestamps that are valid in both calendars Julian and Proleptic 
Gregorian. Currently, it can produce a date (for example `1582-10-06`) which is 
valid in the Proleptic Gregorian calendar. Though it cannot be saved to ORC 
files since ORC format (ORC libs in fact) assumes Julian calendar. So, Spark 
shifts `1582-10-06` to the next valid date `1582-10-15` while saving it to ORC 
files. And as a consequence of that, the test fails because it compares 
original date and the date loaded back from the ORC files.
   
   ### Why are the changes needed?
   The changes fix failures of `HiveOrcHadoopFsRelationSuite`. For instance, 
the test "test all data types" fails with the seed 610710213676:
   ```
   == Results ==
   !== Correct Answer - 20 ==    == Spark Answer - 20 ==
    struct<index:int,col:date>   struct<index:int,col:date>
   ...
   ![9,1582-10-06]               [9,1582-10-15]
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   By running the modified test suite:
   ```
   $ build/sbt -Phive -Phive-thriftserver "test:testOnly 
*HiveOrcHadoopFsRelationSuite"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to