Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]

2024-04-30 Thread via GitHub


tustvold closed issue #5606: Cannot read Parquet files that do not specify Map 
keys as required
URL: https://github.com/apache/arrow-rs/issues/5606


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]

2024-04-11 Thread via GitHub


jupiter commented on issue #5606:
URL: https://github.com/apache/arrow-rs/issues/5606#issuecomment-2050107030

   It works when removing/reducing the check with all files I tested. I have 
not been able to produce any files that have invalid data to match such a 
schema, but I'd assume it would error as it would for any invalid data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]

2024-04-08 Thread via GitHub


tustvold commented on issue #5606:
URL: https://github.com/apache/arrow-rs/issues/5606#issuecomment-2043709651

   We could probably just ignore the malformed map logical type and decode such 
columns as a regular list of structs. This would allow the data to be read, 
without needing to implement custom dremel shredding logic to handle the case 
of malformed MapArray, and allowing users to determine how they wish to handle 
this situation.
   
   Tagging @mapleFU who may have further thoughts on this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]

2024-04-08 Thread via GitHub


jupiter commented on issue #5606:
URL: https://github.com/apache/arrow-rs/issues/5606#issuecomment-2043496776

   It was discussed, but I don't think that was the conclusion. The creator's 
issue was resolved by rewriting a file. 
   
   In order to operate with precious Parquet files from huge data lakes (e.g. 
DataFusion probably would want to support files produced by other systems), I'm 
of the opinion that it should tolerate this like most of the other 
implementations do (e.g. DuckDB, parquet-tools, and probably many more). 
   
   I'm all for correctness, but in this particular case you need to consider 
the intention and purpose.
   
   There is no way that an optional key can be intentional. Being compatible 
with a vast amount of data is the purpose of Parquet integration. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]

2024-04-08 Thread via GitHub


tustvold commented on issue #5606:
URL: https://github.com/apache/arrow-rs/issues/5606#issuecomment-2043422154

   The conclusion of https://github.com/apache/arrow/issues/37389 appears to be 
that we are correct to refuse to read such malformed files, am I missing 
something here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Cannot read Parquet files that do not specify Map keys as required [arrow-rs]

2024-04-08 Thread via GitHub


jupiter commented on issue #5606:
URL: https://github.com/apache/arrow-rs/issues/5606#issuecomment-2042869108

   It was hard to say whether this should be regarded as a bug or feature 
request. It's a bug from the perspective that we'd expect broad compatibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org