[GitHub] spark issue #22398: [SPARK-23820][CORE] Enable use of long form of callsite ...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/22398 @cloud-fan Thanks for the merge again! :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22398: [SPARK-23820][CORE] Enable use of long form of ca...
Github user michaelmior commented on a diff in the pull request: https://github.com/apache/spark/pull/22398#discussion_r217013397 --- Diff: docs/configuration.md --- @@ -746,6 +746,13 @@ Apart from these, the following properties are also available, and may be useful *Warning*: This will increase the size of the event log considerably. + + spark.eventLog.longForm.enabled + false + +Whether to use the long form of call sites in the event log. --- End diff -- Changed. I was thinking the behaviour on false was implicit, but nothing wrong with being explicit anyway :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22398: [SPARK-23820][CORE] Enable use of long form of ca...
GitHub user michaelmior opened a pull request: https://github.com/apache/spark/pull/22398 [SPARK-23820][CORE] Enable use of long form of callsite in logs This is a rework of #21433 to address some concerns there. You can merge this pull request into a Git repository by running: $ git pull https://github.com/michaelmior/spark long-callsite2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22398.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22398 commit 15edc21325f4d6cd249626032aae621880aaf75a Author: Michael Mior Date: 2018-03-28T20:57:41Z [SPARK-23820][CORE] Enable use of long form of callsite in logs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/21433 @gatorsmile @cloud-fan I'll just go with a boolean config as there really is no need for more than two options and this simplifies things quite a bit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21433: [SPARK-23820][CORE] Enable use of long form of ca...
Github user michaelmior commented on a diff in the pull request: https://github.com/apache/spark/pull/21433#discussion_r216285120 --- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala --- @@ -53,10 +55,16 @@ class RDDInfo( } private[spark] object RDDInfo { + private val callsiteForm = SparkEnv.get.conf.get(EVENT_LOG_CALLSITE_FORM) + def fromRdd(rdd: RDD[_]): RDDInfo = { val rddName = Option(rdd.name).getOrElse(Utils.getFormattedClassName(rdd)) val parentIds = rdd.dependencies.map(_.rdd.id) +val callSite = callsiteForm match { + case "short" => rdd.creationSite.shortForm + case "long" => rdd.creationSite.longForm +} new RDDInfo(rdd.id, rddName, rdd.partitions.length, - rdd.getStorageLevel, parentIds, rdd.creationSite.shortForm, rdd.scope) + rdd.getStorageLevel, parentIds, callSite, rdd.scope) --- End diff -- @gatorsmile As far as I am aware, `RDDInfo` is the only place the call site is included in the event log. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/21433 Thanks @srowen! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/21433 @srowen Yes, I don't expect it will be widely used but I've personally found it helpful in some performance debugging and it's a fairly low impact change. I was just hoping to avoid having to keep applying this patch and doing my own build of Spark in the future :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/21433 Rebased on top of master. The [failing test ](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4200/testReport/org.apache.spark.sql.hive/HiveExternalCatalogVersionsSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/) is unrelated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/21433 Done! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21433: [SPARK-23820][CORE] Enable use of long form of ca...
GitHub user michaelmior opened a pull request: https://github.com/apache/spark/pull/21433 [SPARK-23820][CORE] Enable use of long form of callsite in logs You can merge this pull request into a Git repository by running: $ git pull https://github.com/michaelmior/spark long-callsite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21433.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21433 commit 9800d2eddf100d32223e2ef3cae1ad3279368454 Author: Michael Mior Date: 2018-03-28T20:57:41Z [SPARK-23820][CORE] Enable use of long form of callsite in logs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12162: [SPARK-14289][WIP] Support multiple eviction strategies ...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/12162 As best I can tell, the code that was pushed here is incomplete. However, Spark's default cache eviction policy is LRU. You can find the code which performs eviction [here](https://github.com/apache/spark/blob/1e82335413bc2384073ead0d6d581c862036d0f5/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L501). It basically just works by storing all the data in a `LinkedHashMap` configured to track which elements were accessed most recently. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9428: [SPARK-8582][Core]Optimize checkpointing to avoid computi...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/9428 @zsxwing Thanks for the pointer. It's not clear to me why this needs to be supported (and in fact the test no longer exists). However, I'm also not clear why the tests fails in the first place (I compiled and ran the code) but that's probably because I'm relatively new to Scala and don't fully understand the semantics of lazy vals. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9428: [SPARK-8582][Core]Optimize checkpointing to avoid computi...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/9428 @zsxwing @andrewor14 Would either of you be able to explain briefly why this approach doesn't work? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/19263 Thanks for the merge! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/19263 @vanzin Made the changes you recommended. Thanks for the feedback! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...
Github user michaelmior commented on a diff in the pull request: https://github.com/apache/spark/pull/19263#discussion_r145203563 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -41,6 +41,22 @@ package object config { .bytesConf(ByteUnit.MiB) .createWithDefaultString("1g") + private[spark] val EVENT_LOG_COMPRESS = + ConfigBuilder("spark.eventLog.compress").booleanConf.createWithDefault(false) --- End diff -- Okay, I'll change all the event logging configs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...
Github user michaelmior commented on a diff in the pull request: https://github.com/apache/spark/pull/19263#discussion_r145202921 --- Diff: docs/configuration.md --- @@ -714,6 +714,13 @@ Apart from these, the following properties are also available, and may be useful Property NameDefaultMeaning + spark.eventLog.blockUpdatesEnabled + false + +Whether to log events for every block update, if spark.eventLog.enabled is true. --- End diff -- Will do. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...
Github user michaelmior commented on a diff in the pull request: https://github.com/apache/spark/pull/19263#discussion_r145202877 --- Diff: docs/configuration.md --- @@ -714,6 +714,13 @@ Apart from these, the following properties are also available, and may be useful Property NameDefaultMeaning + spark.eventLog.blockUpdatesEnabled --- End diff -- How about `spark.eventLog.logBlockUpdates.enabled`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...
Github user michaelmior commented on a diff in the pull request: https://github.com/apache/spark/pull/19263#discussion_r145202605 --- Diff: core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala --- @@ -291,6 +292,7 @@ object EventLoggingListenerSuite { def getLoggingConf(logDir: Path, compressionCodec: Option[String] = None): SparkConf = { val conf = new SparkConf conf.set("spark.eventLog.enabled", "true") +conf.set("spark.eventLog.blockUpdatesEnabled", "true") --- End diff -- It is being tested. I added a test for a `SparkListenerBlockUpdated` event that won't be generated unless the configuration is enabled. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/19263 @jerryshao Agreed on both points. Changed! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/19263 @vanzin Just wanted to ping since I believe I addressed your concerns. Let me know if it looks like any other changes are required. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...
Github user michaelmior commented on a diff in the pull request: https://github.com/apache/spark/pull/19263#discussion_r141916752 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -66,6 +67,7 @@ private[spark] class EventLoggingListener( private val shouldCompress = sparkConf.getBoolean("spark.eventLog.compress", false) private val shouldOverwrite = sparkConf.getBoolean("spark.eventLog.overwrite", false) + private val shouldLogBlockUpdates = sparkConf.getBoolean("spark.eventLog.blockUpdates", false) --- End diff -- Thanks for clarifying. I opted to convert other event log settings as well for uniformity. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...
Github user michaelmior commented on a diff in the pull request: https://github.com/apache/spark/pull/19263#discussion_r141849398 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -66,6 +67,7 @@ private[spark] class EventLoggingListener( private val shouldCompress = sparkConf.getBoolean("spark.eventLog.compress", false) private val shouldOverwrite = sparkConf.getBoolean("spark.eventLog.overwrite", false) + private val shouldLogBlockUpdates = sparkConf.getBoolean("spark.eventLog.blockUpdates", false) --- End diff -- @vanzin The reason I didn't is because all of the other config settings nearby don't. Would it make sense to change the other event log settings as part of this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/19263 @jerryshao I agree that the history server itself doesn't provide useful information but for detailed analysis I've found it helpful to persist the logs and write my own scripts to analyze them. This extra information is helpful for that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/19263 Whoops. Sorry about that. I opened the PR via the CLI so I didn't see the pointer on the web interface. I should have known better though. Updated! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19263: Optionally add block updates to log
GitHub user michaelmior opened a pull request: https://github.com/apache/spark/pull/19263 Optionally add block updates to log I see that block updates are not logged to the event log. This makes sense as a default for performance reasons. However, I find it helpful when trying to get a better understanding of caching for a job to be able to log these updates. This PR adds a configuration setting `spark.eventLog.blockUpdates` (defaulting to false) which allows block updates to be recorded in the log. You can merge this pull request into a Git repository by running: $ git pull https://github.com/michaelmior/spark log-block-updates Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19263.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19263 commit 2a967c368cf72316b760c5f17503658aa12b901d Author: Michael Mior Date: 2017-07-12T13:32:24Z Optionally add block updates to log --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setName for D...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/16609 Is there any way to create a name without caching the result? I'm parsing the Spark event logs to get some additional data about jobs and having names there is very helpful but I don't want to affect the behaviour of the application. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17900: [SPARK-20637][Core] Remove mention of old RDD cla...
GitHub user michaelmior opened a pull request: https://github.com/apache/spark/pull/17900 [SPARK-20637][Core] Remove mention of old RDD classes from comments ## What changes were proposed in this pull request? A few comments around the code mention RDD classes that do not exist anymore. I'm not sure of the best way to replace these, so I've just removed them here. ## How was this patch tested? Only changes code comments, no testing required You can merge this pull request into a Git repository by running: $ git pull https://github.com/michaelmior/spark remove-old-rdds Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17900.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17900 commit eb45ce2cb3228d0b95c7b873e6ad82b6c4f15e15 Author: Michael Mior Date: 2017-05-08T15:38:36Z Remove mention of old RDD classes from comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12162: [SPARK-14289][WIP] Support multiple eviction strategies ...
Github user michaelmior commented on the issue: https://github.com/apache/spark/pull/12162 This branch appears to be incomplete. The configuration parameter `entryEvictionPolicy` does not exist and there is a good chunk of the code that does not do anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org