Pratyaksh Sharma created HUDI-796: ------------------------------------- Summary: Rewrite DedupeSparkJob.scala without considering the _hoodie_commit_time Key: HUDI-796 URL: https://issues.apache.org/jira/browse/HUDI-796 Project: Apache Hudi (incubating) Issue Type: Improvement Reporter: Pratyaksh Sharma Assignee: Pratyaksh Sharma
_`_hoodie_commit_time` can only be used for deduping a partition path if duplicates happened due to INSERT operation. In case of updates, bloom filter tags both the files where a record is present for update, and all such files will have the same `___hoodie_commit_time__` for a duplicate record henceforth._ _Hence it makes sense to rewrite this class without considering the metadata field._ -- This message was sent by Atlassian Jira (v8.3.4#803005)