Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/546#issuecomment-42525924
  
    Sorry for being late for this conversation. I am busy with my own project.
    
    First, I agree with @srowen. SPARK-1527 was discovered when there were 
untracked files in my spark git dir after running tests. In 
DiskBlockManagerSuite, files were created using file.getName as parent folds, 
which was a relative path to cwd. The cleanup code deletes the original temp 
dir. 
    ```
    val rootDir0 = Files.createTempDir()
    rootDir0.deleteOnExit()
    val rootDir1 = Files.createTempDir()
    rootDir1.deleteOnExit()
    val rootDirs = rootDir0.getName + "," + rootDir1.getName
    ```
    
    This is a problem or a bug I prefer because it doesn't do it tends to do.
    
    In SPARK-1527, srowen suggested that ```toString``` could return relative 
path and should use getAbsolutePath instead. After thinking twice, I think it 
would be ok to use ```toString``` if we make sure we don't change working 
directory between file creation and deletion.  That is to say even if 
tempDir.toString is a relative path, we can still delete this file using 
file.toString as file path.
    
    Back in this pr, as @srowen said, 
    >the set files, which has Strings, is populated with the result of 
File.getAbsolutePath() but then later unpopulated with the result of 
File.toString(), which is File.getPath(), which is not necessarily the same for 
the same file.
    
    There is a problem. ```File.getAbsolutePaht``` is not necessarily the same 
as ```File.getPath```, or even the same as ```File.getCanonicalPath```
    see the code below:
    ```
    scala>val fstr = "./1.txt"
    fstr: String = ./1.txt
    
    scala>val f = new File(fstr)
    f: java.io.File = ./1.txt
    
    scala> f.getPath
    res0: String = ./1.txt
    
    scala> f.getAbsolutePath
    res1: String = /Users/yexianjin/./1.txt
    
    scala> f.getCanonicalPath
    res2: String = /Users/yexianjin/1.txt
    ```
    
    @mridulm I think you are right, we are at case b)paths relative to cwd 
and/or what is returned by Files.createTempFile, and I think we should stick to 
this case. But, we need to review the codebase to make sure path manipulation 
are correct.
    
    So I suggest(like @srowen said):
    1. choose ```getPath``` or ```getAbsolutePath```. ```getAbsolutePath``` can 
deal with changing working directory situation, but ```getPath``` or 
```toString``` is more consistent with spark codebase.
    2. PR for SPARK-1527 could be committed if we use ```getAbsolutePath```, 
otherwise, we can abandon PR for SPARK-1527 and this PR.
    3. make a new one that implements b) or something similar based on our 
choice.
    
    what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to