[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

via GitHub Fri, 04 Aug 2023 03:08:12 -0700


pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665362216


   @wangyum @yaooqinn I agree with your opinion to follow the Hive behavior as 
much as possible, meanwhile, Spark also aims to reduce the difference between 
DS/Hive. As you can see, the file name pattern is not same as Hive but like DS.
   
   - written via Spark 
`hive_orc/part-00000-5a481e57-caf3-471c-9cf3-0ec26e94e7a3-c000`
   - written via Hive `hive_orc/000000_0`
   
   WDYT to add a configuration and disable in default for this feature?
   
   For Parquet/ORC format, the file name does not affect decoding, since the 
compression information is part of the metadata of the file content. 
   
   Given that DS's file name is much more friendly for administrators to 
identify the format and compression codec. I would like to allow Spark to have 
such an ability.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Reply via email to