[ 
https://issues.apache.org/jira/browse/PARQUET-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-899.
-----------------------------------
    Resolution: Duplicate

Quoting from the commit for PARQUET-352:

WriteSupport now has a getName getter method that is added to the footer
if it returns a non-null string as writer.model.name. This is intended
to help identify files written by object models incorrectly.

So writer.model.name is already there for this purpose, albeit undocumented.

> Add metadata field describing the application that wrote the file
> -----------------------------------------------------------------
>
>                 Key: PARQUET-899
>                 URL: https://issues.apache.org/jira/browse/PARQUET-899
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Zoltan Ivanfi
>            Priority: Major
>
> Although the Parquet library should behave the same regardless of what 
> application uses it, occasionally serious interoperability bugs are 
> introduced in specific applications. For example, data written by a specific 
> application may be unnecessarily adjusted or the calculated statistics may be 
> invalid (both actual problems).
> Unfortunately, currently it is not possible to recognize Parquet files 
> affected by application problems because the metadata does not contain any 
> information about the application using the Parquet library. (The name and 
> version number of the Parquet library is recorded, but that only has limited 
> use, because apart from Impala, the most widespread Parquet writers all use 
> the same Java library.)
> To allow creating workarounds for future known issues, we should introduce 
> new metadata fields that applications can populate. The simplest approach is 
> to have one field for the application name and another for its version 
> number. A more sophisticated approach suggested by [~julienledem] could also 
> reference a list of earlier issues that are known to be fixed in the 
> application that wrote the Parquet file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to