[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19786 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r152347531 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -34,10 +34,10 @@ import org.apache.hadoop.fs.permission.FsAction import org.apache.hadoop.hdfs.DistributedFileSystem import org.apache.hadoop.hdfs.protocol.HdfsConstants import org.apache.hadoop.security.AccessControlException +import org.fusesource.leveldbjni.internal.NativeDB import org.apache.spark.{SecurityManager, SparkConf, SparkException} import org.apache.spark.deploy.SparkHadoopUtil -import org.apache.spark.deploy.history.config._ --- End diff -- this import is not used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r152345297 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -570,7 +575,6 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) } val logPath = fileStatus.getPath() -logInfo(s"Replaying log path: $logPath") --- End diff -- I remove this line because in the function call `replay` below, there is another duplicated log "Replaying log path: ..." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r152338077 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) AppStatusStore.CURRENT_VERSION, logDir.toString()) try { - open(new File(path, "listing.ldb"), metadata) + open(dbPath, metadata) } catch { // If there's an error, remove the listing database and any existing UI database // from the store directory, since it's extremely likely that they'll all contain // incompatible information. case _: UnsupportedStoreVersionException | _: MetadataMismatchException => logInfo("Detected incompatible DB versions, deleting...") path.listFiles().foreach(Utils.deleteRecursively) -open(new File(path, "listing.ldb"), metadata) +open(dbPath, metadata) + case e: Exception => +// Get rid of the old data and re-create it. The store is either old or corrupted. --- End diff -- Well, the comment is copied. I think Vanzin means compatibility with older version of Leveldb release. The previous case is about the version of how history server write into Leveldb. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r152322074 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) AppStatusStore.CURRENT_VERSION, logDir.toString()) try { - open(new File(path, "listing.ldb"), metadata) + open(dbPath, metadata) } catch { // If there's an error, remove the listing database and any existing UI database // from the store directory, since it's extremely likely that they'll all contain // incompatible information. case _: UnsupportedStoreVersionException | _: MetadataMismatchException => logInfo("Detected incompatible DB versions, deleting...") path.listFiles().foreach(Utils.deleteRecursively) -open(new File(path, "listing.ldb"), metadata) +open(dbPath, metadata) + case e: Exception => +// Get rid of the old data and re-create it. The store is either old or corrupted. --- End diff -- `is either old`, I think the old version problems is handled by the previous `case`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r152040491 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) AppStatusStore.CURRENT_VERSION, logDir.toString()) try { - open(new File(path, "listing.ldb"), metadata) + open(dbPath, metadata) } catch { // If there's an error, remove the listing database and any existing UI database // from the store directory, since it's extremely likely that they'll all contain // incompatible information. case _: UnsupportedStoreVersionException | _: MetadataMismatchException => logInfo("Detected incompatible DB versions, deleting...") path.listFiles().foreach(Utils.deleteRecursively) -open(new File(path, "listing.ldb"), metadata) +open(dbPath, metadata) + case e: Exception => +// Get rid of the old data and re-create it. The store is either old or corrupted. +logWarning(s"Failed to load disk store $path for app listing.", e) --- End diff -- Yes I thought about catch `DBException` But in this pr I just follows opening disk store for app:https://github.com/apache/spark/blob/0ffa7c488fa8156e2a1aa282e60b7c36b86d8af8/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L307 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r152039449 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) AppStatusStore.CURRENT_VERSION, logDir.toString()) try { - open(new File(path, "listing.ldb"), metadata) + open(dbPath, metadata) } catch { // If there's an error, remove the listing database and any existing UI database // from the store directory, since it's extremely likely that they'll all contain // incompatible information. case _: UnsupportedStoreVersionException | _: MetadataMismatchException => logInfo("Detected incompatible DB versions, deleting...") path.listFiles().foreach(Utils.deleteRecursively) -open(new File(path, "listing.ldb"), metadata) +open(dbPath, metadata) + case e: Exception => +// Get rid of the old data and re-create it. The store is either old or corrupted. +logWarning(s"Failed to load disk store $path for app listing.", e) --- End diff -- I see, shall we catch `DBException` instead of the general `Exception`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r152038882 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) AppStatusStore.CURRENT_VERSION, logDir.toString()) try { - open(new File(path, "listing.ldb"), metadata) + open(dbPath, metadata) } catch { // If there's an error, remove the listing database and any existing UI database // from the store directory, since it's extremely likely that they'll all contain // incompatible information. case _: UnsupportedStoreVersionException | _: MetadataMismatchException => logInfo("Detected incompatible DB versions, deleting...") path.listFiles().foreach(Utils.deleteRecursively) -open(new File(path, "listing.ldb"), metadata) +open(dbPath, metadata) + case e: Exception => +// Get rid of the old data and re-create it. The store is either old or corrupted. +logWarning(s"Failed to load disk store $path for app listing.", e) --- End diff -- @cloud-fan No. Currently history server writes to a leveldb file for each application(if its page is accessed). And there will be one db called listing.ldb which stores the listing. If the version is not matched, then all the db files should be deleted. In this PR, if only the listing db is corrupted, then just delete the listing db and recreate it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r152037332 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) AppStatusStore.CURRENT_VERSION, logDir.toString()) try { - open(new File(path, "listing.ldb"), metadata) + open(dbPath, metadata) } catch { // If there's an error, remove the listing database and any existing UI database // from the store directory, since it's extremely likely that they'll all contain // incompatible information. case _: UnsupportedStoreVersionException | _: MetadataMismatchException => logInfo("Detected incompatible DB versions, deleting...") path.listFiles().foreach(Utils.deleteRecursively) -open(new File(path, "listing.ldb"), metadata) +open(dbPath, metadata) + case e: Exception => +// Get rid of the old data and re-create it. The store is either old or corrupted. +logWarning(s"Failed to load disk store $path for app listing.", e) --- End diff -- Shall we merge it into the previous case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r151988844 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) AppStatusStore.CURRENT_VERSION, logDir.toString()) try { - open(new File(path, "listing.ldb"), metadata) + open(dbPath, metadata) } catch { // If there's an error, remove the listing database and any existing UI database // from the store directory, since it's extremely likely that they'll all contain // incompatible information. case _: UnsupportedStoreVersionException | _: MetadataMismatchException => logInfo("Detected incompatible DB versions, deleting...") path.listFiles().foreach(Utils.deleteRecursively) -open(new File(path, "listing.ldb"), metadata) +open(dbPath, metadata) + case e: Exception => +// Get rid of the old data and re-create it. The store is either old or corrupted. +logWarning(s"Failed to load disk store $path for app listing.", e) --- End diff -- No, the file listing.ldb is built from event logs, which stores list of applications(history server home page). It is used like cache. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19786#discussion_r151983008 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) AppStatusStore.CURRENT_VERSION, logDir.toString()) try { - open(new File(path, "listing.ldb"), metadata) + open(dbPath, metadata) } catch { // If there's an error, remove the listing database and any existing UI database // from the store directory, since it's extremely likely that they'll all contain // incompatible information. case _: UnsupportedStoreVersionException | _: MetadataMismatchException => logInfo("Detected incompatible DB versions, deleting...") path.listFiles().foreach(Utils.deleteRecursively) -open(new File(path, "listing.ldb"), metadata) +open(dbPath, metadata) + case e: Exception => +// Get rid of the old data and re-create it. The store is either old or corrupted. +logWarning(s"Failed to load disk store $path for app listing.", e) --- End diff -- Is there a danger here that a transient error causes this to delete an otherwise valid file? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/19786 [SPARK-22559][CORE]history server: handle exception on opening corrupted listing.ldb ## What changes were proposed in this pull request? Currently history server v2 failed to start if `listing.ldb` is corrupted. This patch get rid of the corrupted `listing.ldb` and re-create it. The exception handling follows [opening disk store for app](https://github.com/apache/spark/blob/0ffa7c488fa8156e2a1aa282e60b7c36b86d8af8/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L307) ## How was this patch tested? manual test Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark listingException Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19786.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19786 commit 054cd7e208c908aa50cc8030b15feee9d2ce71e9 Author: Wang GengliangDate: 2017-11-20T06:59:40Z history server: handle exception on opening corrupted listing.ldb --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org