anishshri-db opened a new pull request, #41089:
URL: https://github.com/apache/spark/pull/41089

   ### What changes were proposed in this pull request?
   Skip reusing sst file for same version of RocksDB state store to avoid id 
mismatch error
   
   ### Why are the changes needed?
   In case of task retry on the same executor, its possible that the original 
task completed the phase of creating the SST files and uploading them to the 
object store. In this case, we also might have added an entry to the in-memory 
map for `versionToRocksDBFiles` for the given version. When the retry task 
creates the local checkpoint, its possible the file name and size is the same, 
but the metadata ID embedded within the file may be different. So, when we try 
to load this version on successful commit, the metadata zip file points to the 
old SST file which results in a RocksDB mismatch id error.
   
   ```
   Mismatch in unique ID on table file 24220. Expected: 
{9692563551998415634,4655083329411385714} Actual: 
{9692563551998415639,10299185534092933087} in file 
/local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212
   ```
   
   This change avoids reusing files for the same version on the same host based 
on the map entries to reduce the chance of running into the error above.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Unit test
   
   RocksDBSuite
   ```
   [info] Run completed in 35 seconds, 995 milliseconds.
   [info] Total number of tests run: 33
   [info] Suites: completed 1, aborted 0
   [info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
   [info] All tests passed.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to