Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]
tustvold closed issue #5606: Cannot read Parquet files that do not specify Map keys as required URL: https://github.com/apache/arrow-rs/issues/5606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]
jupiter commented on issue #5606: URL: https://github.com/apache/arrow-rs/issues/5606#issuecomment-2050107030 It works when removing/reducing the check with all files I tested. I have not been able to produce any files that have invalid data to match such a schema, but I'd assume it would error as it would for any invalid data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]
tustvold commented on issue #5606: URL: https://github.com/apache/arrow-rs/issues/5606#issuecomment-2043709651 We could probably just ignore the malformed map logical type and decode such columns as a regular list of structs. This would allow the data to be read, without needing to implement custom dremel shredding logic to handle the case of malformed MapArray, and allowing users to determine how they wish to handle this situation. Tagging @mapleFU who may have further thoughts on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]
jupiter commented on issue #5606: URL: https://github.com/apache/arrow-rs/issues/5606#issuecomment-2043496776 It was discussed, but I don't think that was the conclusion. The creator's issue was resolved by rewriting a file. In order to operate with precious Parquet files from huge data lakes (e.g. DataFusion probably would want to support files produced by other systems), I'm of the opinion that it should tolerate this like most of the other implementations do (e.g. DuckDB, parquet-tools, and probably many more). I'm all for correctness, but in this particular case you need to consider the intention and purpose. There is no way that an optional key can be intentional. Being compatible with a vast amount of data is the purpose of Parquet integration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]
tustvold commented on issue #5606: URL: https://github.com/apache/arrow-rs/issues/5606#issuecomment-2043422154 The conclusion of https://github.com/apache/arrow/issues/37389 appears to be that we are correct to refuse to read such malformed files, am I missing something here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]
jupiter commented on issue #5606: URL: https://github.com/apache/arrow-rs/issues/5606#issuecomment-2042869108 It was hard to say whether this should be regarded as a bug or feature request. It's a bug from the perspective that we'd expect broad compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org