[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-642344689 > Looks fine in general - one thing looks to be missed is, `LevelDBIterator.close()` should also call `db.closeIterator(this);` to make sure tracker can indicate it. @HeartSaVioR, thanks for your comments. sure, updated, it indicates tracker by `db.notifyIteratorClosed(this);`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-642344689 > Looks fine in general - one thing looks to be missed is, `LevelDBIterator.close()` should also call `db.closeIterator(this);` to make sure tracker can indicate it. @HeartSaVioR, thanks for your comments. sure, updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-642344689 > Looks fine in general - one thing looks to be missed is, `LevelDBIterator.close()` should also call `db.closeIterator(this);` to make sure tracker can indicate it. @HeartSaVioR, thanks for your comments. I changed method name `closeIterator` to `notifyIteratorClosed` and remove `finalize` method as it's not needed any more. With this PR, `LevelDBIterator` should be closed either by explicit close call or DB.close. Then there would not be JNI resource leaking from it after DB is closed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641752768 > Then how about capture the exception and ask the user to increase the related configuration or try loading the page again? Because there is disk space limitation, we can only mitigate it by stopping service and manually cleaning disk cache, this is a little annoying. > What is the benefit of this PR to users? We actually use spark history server as long running service to provide diagnostic experience to others. The benefit should be spark history server reliability improvement. For my side, we don't need to stop service and manually clean HS disk cache after some period. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641752768 > Then how about capture the exception and ask the user to increase the related configuration or try loading the page again? Because there is disk space limitation, we can only mitigate it by stopping service and manually cleaning disk cache, this is a little annoying. > What is the benefit of this PR to users? We actually use spark history server as long running service to provide diagnostic experience to others. The benefit to us is we don't need to stop service and manually clean HS disk cache after some period. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641752768 > Then how about capture the exception and ask the user to increase the related configuration or try loading the page again? Because there is disk space limitation, we can only mitigate it by stopping service and manually cleaning disk cache, this is a little annoying. > What is the benefit of this PR to users? We actually use spark history server as long running service to provide diagnostic experience to others. The benefit to us is we don't need to stop service and manually clean HS disk cache after some period. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641752768 > Then how about capture the exception and ask the user to increase the related configuration or try loading the page again? Because there is disk space limitation, we can only mitigate it by stopping service and manually cleaning disk cache, this is a little annoying. > What is the benefit of this PR to users? We actually use spark history server as long running service to provide diagnostic experience to others. The benefit to us is we can keep it as real long running service and don't need to stop service and manually clean HS disk cache after some period. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641752768 > Then how about capture the exception and ask the user to increase the related configuration or try loading the page again? Because there is disk space limitation, we can only mitigate it by stopping service and manually cleaning disk cache, this is a little annoying. > What is the benefit of this PR to users? We actually use spark history server as long running service to provide diagnostic experience to others. The benefit to us is we don't need to stop service and manually restart HS after some period. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641752768 > Then how about capture the exception and ask the user to increase the related configuration or try loading the page again? Because there is disk space limitation, we can only mitigate it by stopping service and manually cleaning disk cache, this is a little annoying. > What is the benefit of this PR to users? I think the cause of issue is resource leaking (file handle on Windows which prevent releasing space by `HistoryServerDiskManager` ) caused by race condition, my pr is trying to fix this. we actually use spark history server as long running service to provide diagnostic experience to others. The benefit to us is we don't need to stop service and manually restart HS after some period. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641752768 > Then how about capture the exception and ask the user to increase the related configuration or try loading the page again? Because there is disk space limitation, we can only mitigate it by stopping service and manually cleaning disk cache, this is a little annoying. > What is the benefit of this PR to users? I think the cause of issue is resource leaking (file handle on Windows which prevent releasing space by `HistoryServerDiskManager` ) caused by race condition, my pr is trying to fix this. we actually use spark history server to provide diagnostic experience to others. The benefit to us is we don't need to stop service and manually restart HS after some period. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641752768 > Then how about capture the exception and ask the user to increase the related configuration or try loading the page again? Because there is disk space limitation, we can only mitigate it by stopping service and manually cleaning disk cache, this is a little annoying. > What is the benefit of this PR to users? I think the cause of issue is resource leaking (file handle on Windows which prevent releasing space by `HistoryServerDiskManager` ) caused by race condition, my pr is trying to fix this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641708063 > Of course relying on finalize is wrong, but I don't think the intent was to rely on finalize. Not closing these iterators is a bug. I see one case it clearly isn't; there may be others but haven't spotted them. It'd be nice to fix them all instead of the change in this patch but we may want to fix what we can see and also make the change in this patch for now. Thanks for your comments, i get your point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641681617 > > Lots of the paths actually do close the iterator explicitly. And I'm still not sure how .toSeq doesn't consume it. > > I'm not sure you're referring KV store iterator, as I don't see it. Could you please point out some places? > > > Recall that the underlying iterator closes itself when it consumes all elements. > > Maybe you refer to `CompletionIterator` which still need to call `close` explicitly on its implementation. That implementation is on core module and KV store module doesn't even know about such existence. > (And I barely remember that there has been some issue on iterator not fully consumed and completion function is missed to be called.) > > > We can't get at the iterator that .asScala produces but we might be able to add a close() method to the view or something, but that's getting ugly > > I agree current code is pretty much concise, but I'm really worrying that we don't consider resource leak as serious one and be the one of items in trade-offs. I'm not sure I can agree with the view that resource leak can be tolerated to make code beauty. The problem wasn't observed because Linux deals with the file deletion on usage - if we have been having extensive Windows users then the problem should be raised much earlier. > > We can still play with try-with-resource pattern with the returned value of `closeableIterator` - it may still allow us to wrap with asScala, since it also implements Iterator. That forces us to live with a couple of more lines instead of one-liner but I'm not sure it matters. @srowen , i found more AppstatusStore related stack trace in log. and also one case for `closeableIterator` i believe this is the case not all elements are consumed before DB is closed. BTW, from my experiment, if iterator is not closed but db is closed, there is not allow to read any data from iterator (`NoSuchElementException`). So i think closing iterator when db is closed should be safe, and `AppStatusStore` is also closed same time. at org.apache.spark.status.AppStatusStore.resourceProfileInfo(AppStatusStore.scala:56) at org.apache.spark.status.AppStatusStore.jobsList(AppStatusStore.scala:60) at org.apache.spark.status.AppStatusStore.executorList(AppStatusStore.scala:86) at org.apache.spark.status.AppStatusStore.executorSummary(AppStatusStore.scala:421) at org.apache.spark.status.AppStatusStore.rddList(AppStatusStore.scala:425) at org.apache.spark.status.AppStatusStore.streamBlocksList(AppStatusStore.scala:503) at org.apache.spark.status.AppStatusStore.constructTaskDataList(AppStatusStore.scala:537) Iterator is not closed before db is closed: 1424827908, construct stack trace: java.lang.Throwable at org.apache.spark.util.kvstore.LevelDBIterator.(LevelDBIterator.java:58) at org.apache.spark.util.kvstore.LevelDB$1.iterator(LevelDB.java:201) at org.apache.spark.util.kvstore.KVStoreView.**closeableIterator**(KVStoreView.java:117) at org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:122) at org.apache.spark.ui.jobs.JobPage.$anonfun$render$6(JobPage.scala:205) at org.apache.spark.status.AppStatusStore.asOption(AppStatusStore.scala:436) at org.apache.spark.ui.jobs.JobPage.$anonfun$render$5(JobPage.scala:205) at org.apache.spark.ui.jobs.JobPage.$anonfun$render$5$adapted(JobPage.scala:202) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.immutable.List.map(List.scala:298) at org.apache.spark.ui.jobs.JobPage.render(JobPage.scala:202) at org.apache.spark.ui.WebUI.$anonfun$attachPage$1(WebUI.scala:89) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:80) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) at org.apache.spark.deploy.history.ApplicationCacheCheckFilter.doFilter(ApplicationCache.scala:404) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
[GitHub] [spark] zhli1142015 edited a comment on pull request #28769: [SPARK-31929][WEBUI] Close leveldbiterator when leveldb.close
zhli1142015 edited a comment on pull request #28769: URL: https://github.com/apache/spark/pull/28769#issuecomment-641655009 if we see description from below comments, i think even caller calls close, it may still have leaking issue. https://github.com/apache/spark/blob/master/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java ` /** Closes the given iterator if the DB is still open. Trying to close a JNI LevelDB handle with a closed DB can cause JVM crashes, so this ensures that situation does not happen. */` ` void closeIterator(LevelDBIterator it) throws IOException { synchronized (this._db) { DB _db = this._db.get(); if (_db != null) { it.close(); } } }` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org