Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1616#issuecomment-51085554
  
    This seems like an alright fix and I'd like to get it into a release, but 
I'm concerned that this doesn't correctly handle every possible feature of 
`fetchFile`.
    
    For example, there's [some 
code](https://github.com/li-zhihui/spark/blob/cachefiles/core/src/main/scala/org/apache/spark/util/Utils.scala#L444)
 in `fetchFile` to automatically decompress `.tar.gz` files.  I don't remember 
why this code was added (or whether it's actually correct, since it seems to 
assume that files are downloaded into the current working directory), but I'm 
not sure that `fetchCachedFile` will properly handle that case; it seems like 
it would only copy the `.tar.gz` file without decompressing it in the 
executor's directory.
    
    We could try to special-case fix this by moving the decompression logic 
into `fetchCachedFile`, but I'm worried that it will make `fetchFile` even 
harder to understand.  I think that `fetchFile` might be due for a refactoring.
    
    Also, do you think we should just replace `fetchFile` with 
`fetchCachedFile` and keep the uncached version private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to