edgarRd opened a new issue #767: Clarify / Document metrics contract 
URL: https://github.com/apache/incubator-iceberg/issues/767
 
 
   The metrics contract is a bit unclear, from the implementation. Since it's 
not defined in the spec, having the only fully implemented metrics for Parquet, 
and while I'm working on ORC metrics it's not very clear what is the contract 
expected since file formats seem to implement this differently, for instance:
   
   * `Map<Integer, Long> valueCounts()` - it's not clear whether this method 
includes non-null or repeated values. As per the `TestMetrics` it looks like 
value counts *includes null and repeated values* which would be pretty much the 
same as row count, except for nested structures (e.g. lists, maps) - however 
this is not defined.
   
   This issue is to track the discussion about the expected metrics contract 
and get a clear definition.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to