Re: [PR] [SPARK-52482][SQL][CORE] ZStandard support for file data source reader [spark]

via GitHub Thu, 19 Jun 2025 00:12:57 -0700


pan3793 commented on PR #51182:
URL: https://github.com/apache/spark/pull/51182#issuecomment-2986934134


   OK, it's fair enough to fork Hadoop's `LineRecordReader` and use try-catch 
logic for codec fallback given your flexible extending design.
   
   > There will be some performance penalty for examining the magic number and 
reopening the file input stream with appropriate codec.
   
   Have a wrapper Codec to look ahead Magic Number and transfer the InputStream 
to the concrete Codec should eliminate the re-open cost, anyway, this is about 
implementation details and can be discussed later.
   
   > For compatibility check, I have added some more tests that reads files 
compressed with ZSTD in ubuntu (version: 1.4.4+dfsg-3ubuntu0.1).
   
   I think you should at least test reading zstd text file written by Hadoop


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52482][SQL][CORE] ZStandard support for file data source reader [spark]

Reply via email to