[GitHub] spark issue #22398: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-09-13 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/22398
  
@cloud-fan Thanks for the merge again! :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22398: [SPARK-23820][CORE] Enable use of long form of ca...

2018-09-12 Thread michaelmior
Github user michaelmior commented on a diff in the pull request:

https://github.com/apache/spark/pull/22398#discussion_r217013397
  
--- Diff: docs/configuration.md ---
@@ -746,6 +746,13 @@ Apart from these, the following properties are also 
available, and may be useful
 *Warning*: This will increase the size of the event log considerably.
   
 
+
+  spark.eventLog.longForm.enabled
+  false
+  
+Whether to use the long form of call sites in the event log.
--- End diff --

Changed. I was thinking the behaviour on false was implicit, but nothing 
wrong with being explicit anyway :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22398: [SPARK-23820][CORE] Enable use of long form of ca...

2018-09-11 Thread michaelmior
GitHub user michaelmior opened a pull request:

https://github.com/apache/spark/pull/22398

[SPARK-23820][CORE] Enable use of long form of callsite in logs

This is a rework of #21433 to address some concerns there.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/michaelmior/spark long-callsite2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22398


commit 15edc21325f4d6cd249626032aae621880aaf75a
Author: Michael Mior 
Date:   2018-03-28T20:57:41Z

[SPARK-23820][CORE] Enable use of long form of callsite in logs




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-09-10 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/21433
  
@gatorsmile @cloud-fan I'll just go with a boolean config as there really 
is no need for more than two options and this simplifies things quite a bit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21433: [SPARK-23820][CORE] Enable use of long form of ca...

2018-09-10 Thread michaelmior
Github user michaelmior commented on a diff in the pull request:

https://github.com/apache/spark/pull/21433#discussion_r216285120
  
--- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala ---
@@ -53,10 +55,16 @@ class RDDInfo(
 }
 
 private[spark] object RDDInfo {
+  private val callsiteForm = SparkEnv.get.conf.get(EVENT_LOG_CALLSITE_FORM)
+
   def fromRdd(rdd: RDD[_]): RDDInfo = {
 val rddName = 
Option(rdd.name).getOrElse(Utils.getFormattedClassName(rdd))
 val parentIds = rdd.dependencies.map(_.rdd.id)
+val callSite = callsiteForm match {
+  case "short" => rdd.creationSite.shortForm
+  case "long" => rdd.creationSite.longForm
+}
 new RDDInfo(rdd.id, rddName, rdd.partitions.length,
-  rdd.getStorageLevel, parentIds, rdd.creationSite.shortForm, 
rdd.scope)
+  rdd.getStorageLevel, parentIds, callSite, rdd.scope)
--- End diff --

@gatorsmile As far as I am aware, `RDDInfo` is the only place the call site 
is included in the event log.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-07-05 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/21433
  
Thanks @srowen!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-07-03 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/21433
  
@srowen Yes, I don't expect it will be widely used but I've personally 
found it helpful in some performance debugging and it's a fairly low impact 
change. I was just hoping to avoid having to keep applying this patch and doing 
my own build of Spark in the future :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-07-02 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/21433
  
Rebased on top of master. The [failing test 
](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4200/testReport/org.apache.spark.sql.hive/HiveExternalCatalogVersionsSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/)
 is unrelated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-05-25 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/21433
  
Done!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21433: [SPARK-23820][CORE] Enable use of long form of ca...

2018-05-25 Thread michaelmior
GitHub user michaelmior opened a pull request:

https://github.com/apache/spark/pull/21433

[SPARK-23820][CORE] Enable use of long form of callsite in logs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/michaelmior/spark long-callsite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21433.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21433


commit 9800d2eddf100d32223e2ef3cae1ad3279368454
Author: Michael Mior 
Date:   2018-03-28T20:57:41Z

[SPARK-23820][CORE] Enable use of long form of callsite in logs




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12162: [SPARK-14289][WIP] Support multiple eviction strategies ...

2017-11-15 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/12162
  
As best I can tell, the code that was pushed here is incomplete. However, 
Spark's default cache eviction policy is LRU. You can find the code which 
performs eviction 
[here](https://github.com/apache/spark/blob/1e82335413bc2384073ead0d6d581c862036d0f5/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L501).
 It basically just works by storing all the data in a `LinkedHashMap` 
configured to track which elements were accessed most recently.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9428: [SPARK-8582][Core]Optimize checkpointing to avoid computi...

2017-10-31 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/9428
  
@zsxwing Thanks for the pointer. It's not clear to me why this needs to be 
supported (and in fact the test no longer exists). However, I'm also not clear 
why the tests fails in the first place (I compiled and ran the code) but that's 
probably because I'm relatively new to Scala and don't fully understand the 
semantics of lazy vals.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9428: [SPARK-8582][Core]Optimize checkpointing to avoid computi...

2017-10-31 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/9428
  
@zsxwing @andrewor14 Would either of you be able to explain briefly why 
this approach doesn't work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-10-17 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/19263
  
Thanks for the merge!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-10-17 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/19263
  
@vanzin Made the changes you recommended. Thanks for the feedback!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...

2017-10-17 Thread michaelmior
Github user michaelmior commented on a diff in the pull request:

https://github.com/apache/spark/pull/19263#discussion_r145203563
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -41,6 +41,22 @@ package object config {
 .bytesConf(ByteUnit.MiB)
 .createWithDefaultString("1g")
 
+  private[spark] val EVENT_LOG_COMPRESS =
+
ConfigBuilder("spark.eventLog.compress").booleanConf.createWithDefault(false)
--- End diff --

Okay, I'll change all the event logging configs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...

2017-10-17 Thread michaelmior
Github user michaelmior commented on a diff in the pull request:

https://github.com/apache/spark/pull/19263#discussion_r145202921
  
--- Diff: docs/configuration.md ---
@@ -714,6 +714,13 @@ Apart from these, the following properties are also 
available, and may be useful
 
 Property NameDefaultMeaning
 
+  spark.eventLog.blockUpdatesEnabled
+  false
+  
+Whether to log events for every block update, if 
spark.eventLog.enabled is true.
--- End diff --

Will do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...

2017-10-17 Thread michaelmior
Github user michaelmior commented on a diff in the pull request:

https://github.com/apache/spark/pull/19263#discussion_r145202877
  
--- Diff: docs/configuration.md ---
@@ -714,6 +714,13 @@ Apart from these, the following properties are also 
available, and may be useful
 
 Property NameDefaultMeaning
 
+  spark.eventLog.blockUpdatesEnabled
--- End diff --

How about `spark.eventLog.logBlockUpdates.enabled`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...

2017-10-17 Thread michaelmior
Github user michaelmior commented on a diff in the pull request:

https://github.com/apache/spark/pull/19263#discussion_r145202605
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala 
---
@@ -291,6 +292,7 @@ object EventLoggingListenerSuite {
   def getLoggingConf(logDir: Path, compressionCodec: Option[String] = 
None): SparkConf = {
 val conf = new SparkConf
 conf.set("spark.eventLog.enabled", "true")
+conf.set("spark.eventLog.blockUpdatesEnabled", "true")
--- End diff --

It is being tested. I added a test for a `SparkListenerBlockUpdated` event 
that won't be generated  unless the configuration is enabled.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-10-13 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/19263
  
@jerryshao Agreed on both points. Changed!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-10-12 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/19263
  
@vanzin Just wanted to ping since I believe I addressed your concerns. Let 
me know if it looks like any other changes are required.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...

2017-09-29 Thread michaelmior
Github user michaelmior commented on a diff in the pull request:

https://github.com/apache/spark/pull/19263#discussion_r141916752
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -66,6 +67,7 @@ private[spark] class EventLoggingListener(
 
   private val shouldCompress = 
sparkConf.getBoolean("spark.eventLog.compress", false)
   private val shouldOverwrite = 
sparkConf.getBoolean("spark.eventLog.overwrite", false)
+  private val shouldLogBlockUpdates = 
sparkConf.getBoolean("spark.eventLog.blockUpdates", false)
--- End diff --

Thanks for clarifying. I opted to convert other event log settings as well 
for uniformity.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...

2017-09-29 Thread michaelmior
Github user michaelmior commented on a diff in the pull request:

https://github.com/apache/spark/pull/19263#discussion_r141849398
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -66,6 +67,7 @@ private[spark] class EventLoggingListener(
 
   private val shouldCompress = 
sparkConf.getBoolean("spark.eventLog.compress", false)
   private val shouldOverwrite = 
sparkConf.getBoolean("spark.eventLog.overwrite", false)
+  private val shouldLogBlockUpdates = 
sparkConf.getBoolean("spark.eventLog.blockUpdates", false)
--- End diff --

@vanzin The reason I didn't is because all of the other config settings 
nearby don't. Would it make sense to change the other event log settings as 
part of this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-09-26 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/19263
  
@jerryshao I agree that the history server itself doesn't provide useful 
information but for detailed analysis I've found it helpful to persist the logs 
and write my own scripts to analyze them. This extra information is helpful for 
that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-09-18 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/19263
  
Whoops. Sorry about that. I opened the PR via the CLI so I didn't see the 
pointer on the web interface. I should have known better though. Updated!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: Optionally add block updates to log

2017-09-17 Thread michaelmior
GitHub user michaelmior opened a pull request:

https://github.com/apache/spark/pull/19263

Optionally add block updates to log

I see that block updates are not logged to the event log.
This makes sense as a default for performance reasons.
However, I find it helpful when trying to get a better understanding of 
caching for a job to be able to log these updates.
This PR adds a configuration setting `spark.eventLog.blockUpdates` 
(defaulting to false) which allows block updates to be recorded in the log.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/michaelmior/spark log-block-updates

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19263


commit 2a967c368cf72316b760c5f17503658aa12b901d
Author: Michael Mior 
Date:   2017-07-12T13:32:24Z

Optionally add block updates to log




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setName for D...

2017-05-22 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/16609
  
Is there any way to create a name without caching the result? I'm parsing 
the Spark event logs to get some additional data about jobs and having names 
there is very helpful but I don't want to affect the behaviour of the 
application.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17900: [SPARK-20637][Core] Remove mention of old RDD cla...

2017-05-08 Thread michaelmior
GitHub user michaelmior opened a pull request:

https://github.com/apache/spark/pull/17900

[SPARK-20637][Core] Remove mention of old RDD classes from comments

## What changes were proposed in this pull request?

A few comments around the code mention RDD classes that do not exist 
anymore. I'm not sure of the best way to replace these, so I've just removed 
them here.

## How was this patch tested?

Only changes code comments, no testing required

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/michaelmior/spark remove-old-rdds

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17900.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17900


commit eb45ce2cb3228d0b95c7b873e6ad82b6c4f15e15
Author: Michael Mior 
Date:   2017-05-08T15:38:36Z

Remove mention of old RDD classes from comments




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12162: [SPARK-14289][WIP] Support multiple eviction strategies ...

2017-05-05 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/12162
  
This branch appears to be incomplete. The configuration parameter 
`entryEvictionPolicy` does not exist and there is a good chunk of the code that 
does not do anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org