aokolnychyi opened a new pull request #789: Fix race condition in SnapshotProducer URL: https://github.com/apache/incubator-iceberg/pull/789 This PR fixes a race condition in `SnapshotProducer` while generating a new snapshot id. As it turns out, we can use different snapshot ids in `MergingSnapshotProducer` while writing filtered manifests. That can lead to loosing certain manifests that contain only deleted files, which, in turn, will corrupt the table stats. Take a look at the last line of this snippet in `MergingSnapshotProducer`: ``` // filter any existing manifests List<ManifestFile> filtered; if (current != null) { List<ManifestFile> manifests = current.manifests(); filtered = Arrays.asList(filterManifests(metricsEvaluator, manifests)); } else { filtered = ImmutableList.of(); } Iterable<ManifestFile> unmergedManifests = Iterables.filter( Iterables.concat(newManifestsWithMetadata, filtered), // only keep manifests that have live data files or that were written by this commit manifest -> manifest.hasAddedFiles() || manifest.hasExistingFiles() || manifest.snapshotId() == snapshotId()); ``` We need to ensure manifests written in `filterManifests` all have the same snapshot id.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org