aokolnychyi opened a new pull request #789: Fix race condition in 
SnapshotProducer
URL: https://github.com/apache/incubator-iceberg/pull/789
 
 
   This PR fixes a race condition in `SnapshotProducer` while generating a new 
snapshot id.
   
   As it turns out,  we can use different snapshot ids in 
`MergingSnapshotProducer` while writing filtered manifests. That can lead to 
loosing certain manifests that contain only deleted files, which, in turn, will 
corrupt the table stats.
   
   Take a look at the last line of this snippet in `MergingSnapshotProducer`:
   
   ```
   // filter any existing manifests
   List<ManifestFile> filtered;
   if (current != null) {
    List<ManifestFile> manifests = current.manifests();
    filtered = Arrays.asList(filterManifests(metricsEvaluator, manifests));
   } else {
    filtered = ImmutableList.of();
   }
   
   Iterable<ManifestFile> unmergedManifests = Iterables.filter(
       Iterables.concat(newManifestsWithMetadata, filtered),
       // only keep manifests that have live data files or that were written by 
this commit
       manifest -> manifest.hasAddedFiles() || manifest.hasExistingFiles() || 
manifest.snapshotId() == snapshotId());
   ```
    
   We need to ensure manifests written in `filterManifests` all have the same 
snapshot id.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to