Imran Rashid created SPARK-17676: ------------------------------------ Summary: FsHistoryProvider should ignore hidden files Key: SPARK-17676 URL: https://issues.apache.org/jira/browse/SPARK-17676 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Imran Rashid Assignee: Imran Rashid Priority: Minor
FsHistoryProvider currently reads hidden files (beginning with ".") from the log dir. However, it is writing a hidden file *itself* to that dir, which cannot be parsed, as part of a trick to find the scan time according to the file system: {code} val fileName = "." + UUID.randomUUID().toString val path = new Path(logDir, fileName) val fos = fs.create(path) {code} It does delete the tmp file immediately, but we've seen cases where that race ends badly, and there is a logged error. The error is harmless (the log file is ignored and spark moves on to the other log files), but the logged error is very confusing for users, so we should avoid it. {noformat} 2016-09-26 09:10:03,016 ERROR org.apache.spark.deploy.history.FsHistoryProvider: Exception encountered when attempting to load application log hdfs://XXX/user/spark/applicationHistory/.3a5e987c-ace5-4568-9ccd-6285010e399a java.lang.IllegalArgumentException: Codec [3a5e987c-ace5-4568-9ccd-6285010e399a] is not available. Consider setting spark.io.compression.codec=lzf at org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(CompressionCodec.scala:72) at org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(CompressionCodec.scala:72) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:72) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$8$$anonfun$apply$1.apply(EventLoggingListener.scala:309) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$8$$anonfun$apply$1.apply(EventLoggingListener.scala:309) at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$8.apply(EventLoggingListener.scala:309) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$8.apply(EventLoggingListener.scala:308) at scala.Option.map(Option.scala:145) at org.apache.spark.scheduler.EventLoggingListener$.openEventLog(EventLoggingListener.scala:308) at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:518) at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:359) at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:356) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:356) at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$4.run(FsHistoryProvider.scala:277) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org