[GitHub] spark pull request: [SPARK-5227] [SPARK-5679] Disable FileSystem c...

JoshRosen Fri, 13 Feb 2015 16:06:07 -0800

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4599#discussion_r24707723
  
    --- Diff: 
core/src/test/scala/org/apache/spark/input/WholeTextFileRecordReaderSuite.scala 
---
    @@ -42,7 +42,15 @@ class WholeTextFileRecordReaderSuite extends FunSuite 
with BeforeAndAfterAll {
       private var factory: CompressionCodecFactory = _
     
       override def beforeAll() {
    -    sc = new SparkContext("local", "test")
    +    // Hadoop's FileSystem caching does not use the Configuration as part 
of its cache key, which
    +    // can cause Filesystem.get(Configuration) to return a cached instance 
created with a different
    +    // configuration than the one passed to get() (see HADOOP-8490 for 
more details). This caused
    +    // hard-to-reproduce test failures, since any suites that were run 
after this one would inherit
    +    // the new value of "fs.local.block.size" (see SPARK-5227 and 
SPARK-5679). To work around this,
    +    // we disable FileSystem caching in this suite.
    +    val conf = new 
SparkConf().set("spark.hadoop.fs.file.impl.disable.cache", "true")
    --- End diff --
    
    Good question.  If we wanted to disable this in all tests, then I think the 
right place to do that would be in the Maven and SBT builds via system 
properties.
    
    I chose not to do that here because I wasn't sure whether doing so might 
mask bugs, since most users of Spark will run with FileSystem caching enabled 
(I think that disabling it across the board may harm performance, since it 
sounds like a lot of Hadoop code assumes that FileSystem.get is cheap, and, 
accordingly, calls it many times).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5227] [SPARK-5679] Disable FileSystem c...

Reply via email to