xkrogen commented on a change in pull request #29357:
URL: https://github.com/apache/spark/pull/29357#discussion_r465812314



##########
File path: scalastyle-config.xml
##########
@@ -264,6 +264,19 @@ This file is divided into 3 sections:
     of Commons Lang 2 (package org.apache.commons.lang.*)</customMessage>
   </check>
 
+  <check customId="FileSystemGet" level="error" 
class="org.scalastyle.file.RegexChecker" enabled="true">
+    <parameters><parameter 
name="regex">FileSystem.get\([a-zA-Z_$][a-zA-Z_$0-9]*\)</parameter></parameters>
+    <customMessage><![CDATA[
+      Are you sure that you want to use "FileSystem.get(Configuration conf)"? 
If the input
+      configuration is not set properly, a default FileSystem instance will be 
returned. Please use
+      "FileSystem.get(URI uri, Configuration conf)" or 
"Path.getFileSystem(Configuration conf)" instead.

Review comment:
       Using `FileSystem.get(Configuration conf)` is dangerous even outside of 
the misconfiguration scenario you describe. It's entirely possible for Spark 
applications to be working with multiple `FileSystem` instances (say, local 
HDFS and a cloud blob store, or two HDFS clusters). Using 
`FileSystem.get(Configuration conf)` in this instance can also result in nasty 
errors.
   
   Not sure if it's worth mentioning here, will leave that up to your judgement.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to