Re: [I] [VL] No magic bytes found at end of the Parquet file [incubator-gluten]
beliefer closed issue #11062: [VL] No magic bytes found at end of the Parquet file URL: https://github.com/apache/incubator-gluten/issues/11062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] No magic bytes found at end of the Parquet file [incubator-gluten]
beliefer commented on issue #11062: URL: https://github.com/apache/incubator-gluten/issues/11062#issuecomment-3544909271 @FelixYBW I have been trying to fix this issue. Please help me review it once I submit it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] No magic bytes found at end of the Parquet file [incubator-gluten]
FelixYBW commented on issue #11062: URL: https://github.com/apache/incubator-gluten/issues/11062#issuecomment-3535385478 @rui-mo mentioned when Spark pass the split info to Gluten, it has the file format into. So Velox will use the corresponding file reader to open the split. If we saw issue here, it's most because spark pass wrong file format info here for some reason. Velox itself can't detect the file format and create the right file reader. I'm not sure if it causes issue when a partition has mixed parquet and ORC files in Velox. We never tested it. If it has issues, we may repartition the files and make sure some partitions only has parquet, some ones only has ORC. @jinchengchenghh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] No magic bytes found at end of the Parquet file [incubator-gluten]
beliefer commented on issue #11062: URL: https://github.com/apache/incubator-gluten/issues/11062#issuecomment-3526121908 @FelixYBW But this table has many partitions and each of them has different format (parquet or ORC). The data size is big, so we have not enough compute resources to convert. The vanilla Spark supports this scenario. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] No magic bytes found at end of the Parquet file [incubator-gluten]
FelixYBW commented on issue #11062: URL: https://github.com/apache/incubator-gluten/issues/11062#issuecomment-3524753918 It may be easier to support if we can make sure each spark partition either are all parquet files or orc files, the we pass single file format to Velox. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] No magic bytes found at end of the Parquet file [incubator-gluten]
beliefer commented on issue #10784: URL: https://github.com/apache/incubator-gluten/issues/10784#issuecomment-3509731040 @jinshuangxian I met the same issue. How did you fix it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
