Github user taoli91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12693#discussion_r61986122
  
    --- Diff: 
core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala ---
    @@ -243,7 +243,16 @@ private[spark] object ReliableCheckpointRDD extends 
Logging {
           if (fs.exists(partitionerFilePath)) {
             val fileInputStream = fs.open(partitionerFilePath, bufferSize)
             val serializer = SparkEnv.get.serializer.newInstance()
    -        val deserializeStream = 
serializer.deserializeStream(fileInputStream)
    +        // make sure that the file is closed if error occurrs during 
deserialization
    +        val deserializeStream =
    --- End diff --
    
    @srowen Probably some code will explain things better. Do you mean this:
    
    ```scala
            var deserializeStream : DeserializationStream = null
    
            val partitioner = Utils.tryWithSafeFinally[Partitioner] {
              deserializeStream = serializer.deserializeStream(fileInputStream)
              deserializeStream.readObject[Partitioner]
            } {
              deserializeStream.close()
            }
    ```
    There are a few things I am concerning
    1. The semantic has slightly changed (`deserializeStream` is now `var` 
instead of `val`)
    2. I'm not sure is it always safe to close an partially initialized 
`deserializeStream`, in the case of the deserialization throwing exception. I'm 
pretty sure that if we close the `fileInputStream` first may cause the closing 
`deserializeSteam` throwing exception complaining that the input has been 
closed already. 
    
    FYI, I found this problem by while running the test case "checkpointing 
partitioners" in which the `corruptPartitionerFile` flag is turned on in the 
`CheckpointSuite`. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to