Github user ksakellis commented on a diff in the pull request:
https://github.com/apache/spark/pull/4599#discussion_r24707286
--- Diff:
core/src/test/scala/org/apache/spark/input/WholeTextFileRecordReaderSuite.scala
---
@@ -42,7 +42,15 @@ class WholeTextFileRecordReaderSuite extends FunSuite
with BeforeAndAfterAll {
private var factory: CompressionCodecFactory = _
override def beforeAll() {
- sc = new SparkContext("local", "test")
+ // Hadoop's FileSystem caching does not use the Configuration as part
of its cache key, which
+ // can cause Filesystem.get(Configuration) to return a cached instance
created with a different
+ // configuration than the one passed to get() (see HADOOP-8490 for
more details). This caused
+ // hard-to-reproduce test failures, since any suites that were run
after this one would inherit
+ // the new value of "fs.local.block.size" (see SPARK-5227 and
SPARK-5679). To work around this,
+ // we disable FileSystem caching in this suite.
+ val conf = new
SparkConf().set("spark.hadoop.fs.file.impl.disable.cache", "true")
--- End diff --
So, do you think we should disable it across all tests? just in case there
are other tests that also modify the hadoop configuration thinking that the
config objects are local to them? It might bite someone else in the butt later
if we don't globally do this. I don't think there is a global test class that
every tests inherits, maybe we can add it in SparkSparkContext since a lot of
the new tests written use that trait?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]