[GitHub] spark pull request #16924: [SPARK-19531] Send UPDATE_LENGTH for Spark Histor...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16924 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16924: [SPARK-19531] Send UPDATE_LENGTH for Spark Histor...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16924#discussion_r101502214 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -137,7 +138,13 @@ private[spark] class EventLoggingListener( // scalastyle:on println if (flushLogger) { writer.foreach(_.flush()) - hadoopDataStream.foreach(_.hflush()) + hadoopDataStream.foreach(ds => { --- End diff -- OK, and it's not better to just call hsync in all cases -- you have to special case this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16924: [SPARK-19531] Send UPDATE_LENGTH for Spark Histor...
Github user dosoft commented on a diff in the pull request: https://github.com/apache/spark/pull/16924#discussion_r101487342 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -137,7 +138,13 @@ private[spark] class EventLoggingListener( // scalastyle:on println if (flushLogger) { writer.foreach(_.flush()) - hadoopDataStream.foreach(_.hflush()) + hadoopDataStream.foreach(ds => { --- End diff -- hsync() is even stronger than hflush(), since under the cover both methods use the same flushOrSync(), but hsync performs an additional tasks like flushing OS buffers (fsync). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16924: [SPARK-19531] Send UPDATE_LENGTH for Spark Histor...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16924#discussion_r101403013 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -137,7 +138,13 @@ private[spark] class EventLoggingListener( // scalastyle:on println if (flushLogger) { writer.foreach(_.flush()) - hadoopDataStream.foreach(_.hflush()) + hadoopDataStream.foreach(ds => { --- End diff -- OK, if in doubt, would it be perhaps safer to preserve the existing behavior and hflush in all cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16924: [SPARK-19531] Send UPDATE_LENGTH for Spark Histor...
Github user dosoft commented on a diff in the pull request: https://github.com/apache/spark/pull/16924#discussion_r101161834 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -137,7 +138,13 @@ private[spark] class EventLoggingListener( // scalastyle:on println if (flushLogger) { writer.foreach(_.flush()) - hadoopDataStream.foreach(_.hflush()) + hadoopDataStream.foreach(ds => { --- End diff -- seems like hflush() is not required there --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16924: [SPARK-19531] Send UPDATE_LENGTH for Spark Histor...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16924#discussion_r101126469 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -20,16 +20,17 @@ package org.apache.spark.scheduler import java.io._ import java.net.URI import java.nio.charset.StandardCharsets +import java.util --- End diff -- Import the class please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16924: [SPARK-19531] Send UPDATE_LENGTH for Spark Histor...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16924#discussion_r101126998 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -137,7 +138,13 @@ private[spark] class EventLoggingListener( // scalastyle:on println if (flushLogger) { writer.foreach(_.flush()) - hadoopDataStream.foreach(_.hflush()) + hadoopDataStream.foreach(ds => { --- End diff -- ``` ...foreach(df => df.getWrappedStream match { case wrapped: DFSOutputStream => wrapped.hsync(...) case _ => df.hflush() }) ``` maybe? I think that 95% works. You don't hflush in the first case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16924: [SPARK-19531] Send UPDATE_LENGTH for Spark Histor...
GitHub user dosoft opened a pull request: https://github.com/apache/spark/pull/16924 [SPARK-19531] Send UPDATE_LENGTH for Spark History service ## What changes were proposed in this pull request? During writing to the .inprogress file (stored on the HDFS) Hadoop doesn't update file length until close and therefor Spark's history server can't detect any changes. We have to send UPDATE_LENGTH manually. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dosoft/spark SPARK-19531 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16924.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16924 commit f87c5155832435c9dc17053521d61ae0ce06f8d8 Author: Oleg Danilov Date: 2017-02-01T13:06:22Z [SPARK-19531] Send UPDATE_LENGTH for Spark History service --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org