Github user advancedxy commented on the pull request:
https://github.com/apache/spark/pull/546#issuecomment-42525924
Sorry for being late for this conversation. I am busy with my own project.
First, I agree with @srowen. SPARK-1527 was discovered when there were
untracked files in my spark git dir after running tests. In
DiskBlockManagerSuite, files were created using file.getName as parent folds,
which was a relative path to cwd. The cleanup code deletes the original temp
dir.
```
val rootDir0 = Files.createTempDir()
rootDir0.deleteOnExit()
val rootDir1 = Files.createTempDir()
rootDir1.deleteOnExit()
val rootDirs = rootDir0.getName + "," + rootDir1.getName
```
This is a problem or a bug I prefer because it doesn't do it tends to do.
In SPARK-1527, srowen suggested that ```toString``` could return relative
path and should use getAbsolutePath instead. After thinking twice, I think it
would be ok to use ```toString``` if we make sure we don't change working
directory between file creation and deletion. That is to say even if
tempDir.toString is a relative path, we can still delete this file using
file.toString as file path.
Back in this pr, as @srowen said,
>the set files, which has Strings, is populated with the result of
File.getAbsolutePath() but then later unpopulated with the result of
File.toString(), which is File.getPath(), which is not necessarily the same for
the same file.
There is a problem. ```File.getAbsolutePaht``` is not necessarily the same
as ```File.getPath```, or even the same as ```File.getCanonicalPath```
see the code below:
```
scala>val fstr = "./1.txt"
fstr: String = ./1.txt
scala>val f = new File(fstr)
f: java.io.File = ./1.txt
scala> f.getPath
res0: String = ./1.txt
scala> f.getAbsolutePath
res1: String = /Users/yexianjin/./1.txt
scala> f.getCanonicalPath
res2: String = /Users/yexianjin/1.txt
```
@mridulm I think you are right, we are at case b)paths relative to cwd
and/or what is returned by Files.createTempFile, and I think we should stick to
this case. But, we need to review the codebase to make sure path manipulation
are correct.
So I suggest(like @srowen said):
1. choose ```getPath``` or ```getAbsolutePath```. ```getAbsolutePath``` can
deal with changing working directory situation, but ```getPath``` or
```toString``` is more consistent with spark codebase.
2. PR for SPARK-1527 could be committed if we use ```getAbsolutePath```,
otherwise, we can abandon PR for SPARK-1527 and this PR.
3. make a new one that implements b) or something similar based on our
choice.
what do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---