[GitHub] spark issue #21849: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user sahilTakiar commented on the issue: https://github.com/apache/spark/pull/21849 @vanzin addressed your comments and updated the PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21849: [SPARK-24243][CORE] Expose exceptions from InProc...
Github user sahilTakiar commented on a diff in the pull request: https://github.com/apache/spark/pull/21849#discussion_r214110048 --- Diff: launcher/src/main/java/org/apache/spark/launcher/OutputRedirector.java --- @@ -17,6 +17,8 @@ package org.apache.spark.launcher; +import org.apache.commons.lang.StringUtils; --- End diff -- copied the code into a helper method --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21849: [SPARK-24243][CORE] Expose exceptions from InProc...
Github user sahilTakiar commented on a diff in the pull request: https://github.com/apache/spark/pull/21849#discussion_r214109949 --- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java --- @@ -46,6 +47,18 @@ public synchronized void disconnect() { } } + /** + * Since {@link SparkLauncher} runs {@code spark-submit} in a sub-process, its difficult to --- End diff -- updated the javadocs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21849: [SPARK-24243][CORE] Expose exceptions from InProc...
Github user sahilTakiar commented on a diff in the pull request: https://github.com/apache/spark/pull/21849#discussion_r214109901 --- Diff: core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java --- @@ -227,6 +220,82 @@ public void testInProcessLauncherDoesNotKillJvm() throws Exception { assertEquals(SparkAppHandle.State.LOST, handle.getState()); } + @Test + public void testInProcessLauncherGetError() throws Exception { +// Because this test runs SparkLauncher in process and in client mode, it pollutes the system +// properties, and that can cause test failures down the test pipeline. So restore the original +// system properties after this test runs. +Map properties = new HashMap<>(System.getProperties()); + +SparkAppHandle handle = null; +try { + handle = new InProcessLauncher() +.setMaster("local") +.setAppResource(SparkLauncher.NO_RESOURCE) +.setMainClass(ErrorInProcessTestApp.class.getName()) +.addAppArgs("hello") +.startApplication(); + + final SparkAppHandle _handle = handle; + eventually(Duration.ofSeconds(60), Duration.ofMillis(1000), () -> { +assertEquals(SparkAppHandle.State.FAILED, _handle.getState()); + }); + + assertNotNull(handle.getError()); + assertTrue(handle.getError().isPresent()); + assertTrue(handle.getError().get().getCause() == DUMMY_EXCEPTION); --- End diff -- changed to `assertSame` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21849: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user sahilTakiar commented on the issue: https://github.com/apache/spark/pull/21849 @vanzin could you take a look? I'm not sure if the same race conditions present in the other unit tests apply to the new ones since no `SparkContext` is being created. For now I didn't add any synchronization, but I can add some. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21849: [SPARK-24243][CORE] Expose exceptions from InProc...
GitHub user sahilTakiar opened a pull request: https://github.com/apache/spark/pull/21849 [SPARK-24243][CORE] Expose exceptions from InProcessAppHandle ## What changes were proposed in this pull request? Adds a new method to `SparkAppHandle` called `getError` which returns the exception (if present) that caused the underlying Spark app to fail. ## How was this patch tested? New tests added to `SparkLauncherSuite` for the new method. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sahilTakiar/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21849.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21849 commit 7e3100ed148f98078c04e5aba8a41ead8776f89b Author: Sahil Takiar Date: 2018-07-23T17:31:24Z [SPARK-24243][CORE] Expose exceptions from InProcessAppHandle Adds a new method to `SparkAppHandle` called `getError` which returns the exception (if present) that caused the underlying Spark app to fail. New tests added to `SparkLauncherSuite` for the new method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...
Github user sahilTakiar commented on the issue: https://github.com/apache/spark/pull/20893 Wrote a test in `SparkLauncherSuite` and was able to replicate the error from HIVE-18533, and then realized the exception is only logged and then swallowed. From `SparkContext` ``` if (_dagScheduler != null) { Utils.tryLogNonFatalError { _dagScheduler.stop() } _dagScheduler = null } ``` Seems the failures I was seeing in HIVE-18533 are due to something else. Regardless, this is still probably a good fix since you still don't want to write to the connection unless its open, but given that the exception is only logged and not thrown, don't see an easy way to write a test for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't c...
Github user sahilTakiar commented on a diff in the pull request: https://github.com/apache/spark/pull/20893#discussion_r177577783 --- Diff: core/src/main/scala/org/apache/spark/launcher/LauncherBackend.scala --- @@ -114,10 +114,10 @@ private[spark] abstract class LauncherBackend { override def close(): Unit = { try { +_isConnected = false super.close() } finally { onDisconnected() --- End diff -- Yeah, your right, it doesn't look like `onDisconnected` is being used anywhere, but its marked as `protected` so I guess its meant to be used by a sub-class (although no sub-class uses it). So agree, this change should be safe, it doesn't change the semantics of any use of `onDisconnected` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...
Github user sahilTakiar commented on the issue: https://github.com/apache/spark/pull/20893 Ok, I'll work on writing a test for `SparkLauncherSuite`. The test added here was meant to cover the race condition mentioned [here|https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#bucketing-sorting-and-partitioning] --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't c...
Github user sahilTakiar commented on a diff in the pull request: https://github.com/apache/spark/pull/20893#discussion_r177208050 --- Diff: core/src/main/scala/org/apache/spark/launcher/LauncherBackend.scala --- @@ -114,10 +114,10 @@ private[spark] abstract class LauncherBackend { override def close(): Unit = { try { +_isConnected = false super.close() } finally { onDisconnected() --- End diff -- `_isConnected` is used in `def isConnected()`. Moving it from the `finally` block to before the call to `super.close()` avoids a race condition where a client tries to write to the connection after it has been closed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't c...
GitHub user sahilTakiar opened a pull request: https://github.com/apache/spark/pull/20893 [SPARK-23785][LAUNCHER] LauncherBackend doesn't check state of connection before setting state ## What changes were proposed in this pull request? Changed `LauncherBackend` `set` method so that it checks if the connection is open or not before writing to it (uses `isConnected`). ## How was this patch tested? None You can merge this pull request into a Git repository by running: $ git pull https://github.com/sahilTakiar/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20893.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20893 commit 1896236c9b79975e8add4017eb7fafc1cec59d70 Author: Sahil Takiar Date: 2018-03-23T20:24:21Z [SPARK-23785][LAUNCHER] LauncherBackend doesn't check state of connection before setting state --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses t...
GitHub user sahilTakiar opened a pull request: https://github.com/apache/spark/pull/20815 [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wrong class in getLogger ## What changes were proposed in this pull request? Changed `Logger` in `InProcessAppHandle` to use `InProcessAppHandle` instead of `ChildProcAppHandle` You can merge this pull request into a Git repository by running: $ git pull https://github.com/sahilTakiar/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20815.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20815 commit 1518a5af591b2254e947a60e0ec107551f2155a4 Author: Sahil Takiar Date: 2018-03-13T18:24:20Z [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wrong class in getLogger --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfigurati...
Github user sahilTakiar commented on a diff in the pull request: https://github.com/apache/spark/pull/19413#discussion_r142499531 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -157,20 +157,23 @@ class HadoopRDD[K, V]( if (conf.isInstanceOf[JobConf]) { logDebug("Re-using user-broadcasted JobConf") conf.asInstanceOf[JobConf] - } else if (HadoopRDD.containsCachedMetadata(jobConfCacheKey)) { -logDebug("Re-using cached JobConf") -HadoopRDD.getCachedMetadata(jobConfCacheKey).asInstanceOf[JobConf] } else { -// Create a JobConf that will be cached and used across this RDD's getJobConf() calls in the -// local process. The local cache is accessed through HadoopRDD.putCachedMetadata(). -// The caching helps minimize GC, since a JobConf can contain ~10KB of temporary objects. -// Synchronize to prevent ConcurrentModificationException (SPARK-1097, HADOOP-10456). -HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized { - logDebug("Creating new JobConf and caching it for later re-use") - val newJobConf = new JobConf(conf) - initLocalJobConfFuncOpt.foreach(f => f(newJobConf)) - HadoopRDD.putCachedMetadata(jobConfCacheKey, newJobConf) - newJobConf +val newJobConf = HadoopRDD.getCachedMetadata(jobConfCacheKey) --- End diff -- Thanks for the tip. Updated the patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfigurati...
Github user sahilTakiar commented on a diff in the pull request: https://github.com/apache/spark/pull/19413#discussion_r142466295 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -157,20 +157,23 @@ class HadoopRDD[K, V]( if (conf.isInstanceOf[JobConf]) { logDebug("Re-using user-broadcasted JobConf") conf.asInstanceOf[JobConf] - } else if (HadoopRDD.containsCachedMetadata(jobConfCacheKey)) { -logDebug("Re-using cached JobConf") -HadoopRDD.getCachedMetadata(jobConfCacheKey).asInstanceOf[JobConf] } else { -// Create a JobConf that will be cached and used across this RDD's getJobConf() calls in the -// local process. The local cache is accessed through HadoopRDD.putCachedMetadata(). -// The caching helps minimize GC, since a JobConf can contain ~10KB of temporary objects. -// Synchronize to prevent ConcurrentModificationException (SPARK-1097, HADOOP-10456). -HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized { - logDebug("Creating new JobConf and caching it for later re-use") - val newJobConf = new JobConf(conf) - initLocalJobConfFuncOpt.foreach(f => f(newJobConf)) - HadoopRDD.putCachedMetadata(jobConfCacheKey, newJobConf) - newJobConf +val newJobConf = HadoopRDD.getCachedMetadata(jobConfCacheKey) --- End diff -- I could do that, but then I would have to remove the `logDebug("Re-using cached JobConf")` statement. Unless there is some way to get around that in Scala. Otherwise, I can remove the val name to `cachedJobConf`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...
Github user sahilTakiar commented on the issue: https://github.com/apache/spark/pull/19413 Fixed the style check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfigurati...
GitHub user sahilTakiar opened a pull request: https://github.com/apache/spark/pull/19413 [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration throws NPE ## What changes were proposed in this pull request? Fix for SPARK-20466, full description of the issue in the JIRA. To summarize, `HadoopRDD` uses a metadata cache to cache `JobConf` objects. The cache uses soft-references, which means the JVM can delete entries from the cache whenever there is GC pressure. `HadoopRDD#getJobConf` had a bug where it would check if the cache contained the `JobConf`, if it did it would get the `JobConf` from the cache and return it. This doesn't work when soft-references are used as the JVM can delete the entry between the existence check and the get call. ## How was this patch tested? Haven't thought of a good way to test this yet given the issue only occurs sometimes, and happens during high GC pressure. Was thinking of using mocks to verify `#getJobConf` is doing the right thing. I deleted the method `HadoopRDD#containsCachedMetadata` so that we don't hit this issue again. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sahilTakiar/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19413.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19413 commit 680f32c311e33784e11763c109488d528178efc8 Author: Sahil Takiar Date: 2017-10-02T20:44:23Z [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration throws NPE --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17499: [SPARK-20161][CORE] Default log4j properties file...
Github user sahilTakiar closed the pull request at: https://github.com/apache/spark/pull/17499 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17499: [SPARK-20161][CORE] Default log4j properties file should...
Github user sahilTakiar commented on the issue: https://github.com/apache/spark/pull/17499 Sounds like we agree that if this should be done, it should be done in Hive. We have an open JIRA (HIVE-13517) to address this. If there are no other comments, I'll close this PR and the corresponding JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17499: [SPARK-20161][CORE] Default log4j properties file should...
Github user sahilTakiar commented on the issue: https://github.com/apache/spark/pull/17499 Thanks for taking look everyone. The original motivation for this PR comes [HIVE-13517](https://issues.apache.org/jira/browse/HIVE-13517). It was said to be useful for debugging HoS applications that run on YARN. I personally can't point to a specific example where knowing the thread-id has helped debugging Spark applications. The HIVE JIRA mentions that the logs from different threads get mixed, which makes it difficult to debug. The idea is that adding the thread-id to the logs will help other Spark applications too, but if thats not the case then I think we can close this. I've tagged some HoS experts in SPARK-20161 to see if they can provide more details. As Marcelo said, there is still a workaround on the Hive side for this issue, a custom log4j.properties file can always be provided. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17499: [SPARK-20161][CORE] Default log4j properties file should...
Github user sahilTakiar commented on the issue: https://github.com/apache/spark/pull/17499 I'm not sure how the community feels about adding this to the default log4j files, so posting this as a reference for now. Some more details are in the JIRA, but this can improve debugabbility, especially for Spark-on-YARN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17499: [SPARK-20161][CORE] Default log4j properties file...
GitHub user sahilTakiar opened a pull request: https://github.com/apache/spark/pull/17499 [SPARK-20161][CORE] Default log4j properties file should print thread-id in ConversionPattern ## What changes were proposed in this pull request? Change the default log4j properties files so that they display the thread name in the log output. ## How was this patch tested? No testing done yet You can merge this pull request into a Git repository by running: $ git pull https://github.com/sahilTakiar/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17499.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17499 commit 417a0ddc6f7fef8af007d13562f3681e72715bc0 Author: Sahil Takiar Date: 2017-03-31T20:23:44Z [SPARK-20161][CORE] Default log4j properties file should print thread-id in ConversionPattern --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org