modi95 commented on issue #25176: [SPARK-28417][CORE] Wrap File Glob Resolution in DoAs to use ProxyUser Credentials URL: https://github.com/apache/spark/pull/25176#issuecomment-512541489 > I think `downloadFileList` should also be wrapped with `doAs` as proxy user. Makes sense. I actually found quite a few things that would benefit from being wrapped in `doAs`. [All of these lines](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L392-L457) are fetching a file using `downloadFile` (`downloadFileList` is a wrapper aorund `downloadFile`). Do you think I should wrap all of these lines in `doAs`, or do you think `downloadFile` ([link to definition](https://github.com/apache/spark/blob/c277afb12b61a91272568dd46380c0d0a9958989/core/src/main/scala/org/apache/spark/util/Utils.scala#L553)) should accept a `ugi` object as a parameter? Alternatively, I could create a new `downloadFileAs(path: String, targetDir: File, sparkConf: SparkConf, hadoopConf: Configuration, secMgr: SecurityManager, ugi: UserGroupInformation): String ` function which wraps this? I could also edit `downloadFile` into this: ```scala def downloadFile( path: String, targetDir: File, sparkConf: SparkConf, hadoopConf: Configuration, secMgr: SecurityManager, ugi: UserGroupInformation = UserGroupInformation.getCurrentUser): String = { require(path != null, "path cannot be null.") val uri = Utils.resolveURI(path) uri.getScheme match { case "file" | "local" => path case "http" | "https" | "ftp" if Utils.isTesting => // This is only used for SparkSubmitSuite unit test. Instead of downloading file remotely, // return a dummy local path instead. val file = new File(uri.getPath) new File(targetDir, file.getName).toURI.toString case _ => val fname = new Path(uri).getName() var localFile: File = null ugi.doAs(new PrivilegedExceptionAction[Unit]() { override def run(): Unit = { localFile = Utils.doFetchFile(uri.toString(), targetDir, fname, sparkConf, secMgr, hadoopConf) } }) localFile.toURI().toString() } } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
