[jira] [Created] (SPARK-15966) Fix markdown for Spark Monitoring
Dhruve Ashar created SPARK-15966: Summary: Fix markdown for Spark Monitoring Key: SPARK-15966 URL: https://issues.apache.org/jira/browse/SPARK-15966 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.0.0 Reporter: Dhruve Ashar Priority: Trivial The markdown for Spark monitoring needs to be fixed. http://spark.apache.org/docs/2.0.0-preview/monitoring.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15966) Fix markdown for Spark Monitoring
[ https://issues.apache.org/jira/browse/SPARK-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-15966: - Description: The markdown for Spark monitoring needs to be fixed. http://spark.apache.org/docs/2.0.0-preview/monitoring.html The closing tag is missing for `spark.ui.view.acls.groups`, which is causing the markdown to render incorrectly. was: The markdown for Spark monitoring needs to be fixed. http://spark.apache.org/docs/2.0.0-preview/monitoring.html > Fix markdown for Spark Monitoring > - > > Key: SPARK-15966 > URL: https://issues.apache.org/jira/browse/SPARK-15966 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.0.0 >Reporter: Dhruve Ashar >Priority: Trivial > > The markdown for Spark monitoring needs to be fixed. > http://spark.apache.org/docs/2.0.0-preview/monitoring.html > The closing tag is missing for `spark.ui.view.acls.groups`, which is causing > the markdown to render incorrectly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16018) Shade netty for shuffle to work on YARN
Dhruve Ashar created SPARK-16018: Summary: Shade netty for shuffle to work on YARN Key: SPARK-16018 URL: https://issues.apache.org/jira/browse/SPARK-16018 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 2.0.0 Reporter: Dhruve Ashar Priority: Blocker Fix For: 2.0.0 We replaced LazyFileRegion with netty's DefaultFileRegion. Because of this (https://issues.apache.org/jira/browse/SPARK-15178), we have a dependency on netty 4.0.25.Final or >, but hadoop is on 4.0.23.Final, so the shuffle jar won't load in nodemanager as its trying to find a new version of the class. We need to shade netty's classes inorder to load the latest version we need and not depend on the one loaded by hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14572) Update Config Doc to specify -Xms in extraJavaOptions
Dhruve Ashar created SPARK-14572: Summary: Update Config Doc to specify -Xms in extraJavaOptions Key: SPARK-14572 URL: https://issues.apache.org/jira/browse/SPARK-14572 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.6.0 Reporter: Dhruve Ashar Priority: Minor Fix For: 2.0.0 [SPARK-12384|https://issues.apache.org/jira/browse/SPARK-12384] Allows the user to specify the initial heap memory -Xms through the *.extraJavaOptions. We have to update the entries in the configuration docs to mention this behavior compared to the previous one where passing heap memory settings through *.extraJavaOptions was not allowed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16441) Spark application hang when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-16441: - External issue URL: (was: https://issues.apache.org/jira/browse/SPARK-15703) > Spark application hang when dynamic allocation is enabled > - > > Key: SPARK-16441 > URL: https://issues.apache.org/jira/browse/SPARK-16441 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2 > Environment: hadoop 2.7.2 spark1.6.2 >Reporter: cen yuhai > > spark application are waiting for rpc response all the time and spark > listener are blocked by dynamic allocation. Executors can not connect to > driver and lost. > "spark-dynamic-executor-allocation" #239 daemon prio=5 os_prio=0 > tid=0x7fa304438000 nid=0xcec6 waiting on condition [0x7fa2b81e4000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00070fdb94f8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:436) > - locked <0x828a8960> (a > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend) > at > org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1438) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359) > at > org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223) > "SparkListenerBus" #161 daemon prio=5 os_prio=0 tid=0x7fa3053be000 > nid=0xcec9 waiting for monitor entry [0x7fa2b3dfc000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:618) > - waiting to lock <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > at > org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64) >
[jira] [Updated] (SPARK-16441) Spark application hang when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-16441: - External issue URL: https://issues.apache.org/jira/browse/SPARK-15703 > Spark application hang when dynamic allocation is enabled > - > > Key: SPARK-16441 > URL: https://issues.apache.org/jira/browse/SPARK-16441 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2 > Environment: hadoop 2.7.2 spark1.6.2 >Reporter: cen yuhai > > spark application are waiting for rpc response all the time and spark > listener are blocked by dynamic allocation. Executors can not connect to > driver and lost. > "spark-dynamic-executor-allocation" #239 daemon prio=5 os_prio=0 > tid=0x7fa304438000 nid=0xcec6 waiting on condition [0x7fa2b81e4000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00070fdb94f8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:436) > - locked <0x828a8960> (a > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend) > at > org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1438) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359) > at > org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223) > "SparkListenerBus" #161 daemon prio=5 os_prio=0 tid=0x7fa3053be000 > nid=0xcec9 waiting for monitor entry [0x7fa2b3dfc000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:618) > - waiting to lock <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > at > org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64) > at
[jira] [Commented] (SPARK-16441) Spark application hang when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392049#comment-15392049 ] Dhruve Ashar commented on SPARK-16441: -- How large is the data being sent to driver? And is this consistently reproducible? > Spark application hang when dynamic allocation is enabled > - > > Key: SPARK-16441 > URL: https://issues.apache.org/jira/browse/SPARK-16441 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2 > Environment: hadoop 2.7.2 spark1.6.2 >Reporter: cen yuhai > > spark application are waiting for rpc response all the time and spark > listener are blocked by dynamic allocation. Executors can not connect to > driver and lost. > "spark-dynamic-executor-allocation" #239 daemon prio=5 os_prio=0 > tid=0x7fa304438000 nid=0xcec6 waiting on condition [0x7fa2b81e4000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00070fdb94f8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:436) > - locked <0x828a8960> (a > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend) > at > org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1438) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359) > at > org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223) > "SparkListenerBus" #161 daemon prio=5 os_prio=0 tid=0x7fa3053be000 > nid=0xcec9 waiting for monitor entry [0x7fa2b3dfc000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:618) > - waiting to lock <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > at > org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at >
[jira] [Updated] (SPARK-16441) Spark application hang when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-16441: - Affects Version/s: 2.0.0 > Spark application hang when dynamic allocation is enabled > - > > Key: SPARK-16441 > URL: https://issues.apache.org/jira/browse/SPARK-16441 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2, 2.0.0 > Environment: hadoop 2.7.2 spark1.6.2 >Reporter: cen yuhai > > spark application are waiting for rpc response all the time and spark > listener are blocked by dynamic allocation. Executors can not connect to > driver and lost. > "spark-dynamic-executor-allocation" #239 daemon prio=5 os_prio=0 > tid=0x7fa304438000 nid=0xcec6 waiting on condition [0x7fa2b81e4000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00070fdb94f8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:436) > - locked <0x828a8960> (a > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend) > at > org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1438) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359) > at > org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223) > "SparkListenerBus" #161 daemon prio=5 os_prio=0 tid=0x7fa3053be000 > nid=0xcec9 waiting for monitor entry [0x7fa2b3dfc000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:618) > - waiting to lock <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > at > org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64) > at
[jira] [Updated] (SPARK-15703) Spark UI doesn't show all tasks as completed when it should
[ https://issues.apache.org/jira/browse/SPARK-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-15703: - Attachment: SparkListenerBus .png spark-dynamic-executor-allocation.png > Spark UI doesn't show all tasks as completed when it should > --- > > Key: SPARK-15703 > URL: https://issues.apache.org/jira/browse/SPARK-15703 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Priority: Critical > Attachments: Screen Shot 2016-06-01 at 11.21.32 AM.png, Screen Shot > 2016-06-01 at 11.23.48 AM.png, SparkListenerBus .png, > spark-dynamic-executor-allocation.png > > > The Spark UI doesn't seem to be showing all the tasks and metrics. > I ran a job with 10 tasks but Detail stage page says it completed 93029: > Summary Metrics for 93029 Completed Tasks > The Stages for all jobs pages list that only 89519/10 tasks finished but > its completed. The metrics for shuffled write and input are also incorrect. > I will attach screen shots. > I checked the logs and it does show that all the tasks actually finished. > 16/06/01 16:15:42 INFO TaskSetManager: Finished task 59880.0 in stage 2.0 > (TID 54038) in 265309 ms on 10.213.45.51 (10/10) > 16/06/01 16:15:42 INFO YarnClusterScheduler: Removed TaskSet 2.0, whose tasks > have all completed, from pool -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15703) Spark UI doesn't show all tasks as completed when it should
[ https://issues.apache.org/jira/browse/SPARK-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-15703: - Component/s: Scheduler > Spark UI doesn't show all tasks as completed when it should > --- > > Key: SPARK-15703 > URL: https://issues.apache.org/jira/browse/SPARK-15703 > Project: Spark > Issue Type: Bug > Components: Scheduler, Web UI >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Priority: Critical > Attachments: Screen Shot 2016-06-01 at 11.21.32 AM.png, Screen Shot > 2016-06-01 at 11.23.48 AM.png, SparkListenerBus .png, > spark-dynamic-executor-allocation.png > > > The Spark UI doesn't seem to be showing all the tasks and metrics. > I ran a job with 10 tasks but Detail stage page says it completed 93029: > Summary Metrics for 93029 Completed Tasks > The Stages for all jobs pages list that only 89519/10 tasks finished but > its completed. The metrics for shuffled write and input are also incorrect. > I will attach screen shots. > I checked the logs and it does show that all the tasks actually finished. > 16/06/01 16:15:42 INFO TaskSetManager: Finished task 59880.0 in stage 2.0 > (TID 54038) in 265309 ms on 10.213.45.51 (10/10) > 16/06/01 16:15:42 INFO YarnClusterScheduler: Removed TaskSet 2.0, whose tasks > have all completed, from pool -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15703) Spark UI doesn't show all tasks as completed when it should
[ https://issues.apache.org/jira/browse/SPARK-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384331#comment-15384331 ] Dhruve Ashar commented on SPARK-15703: -- Here are some of the findings: LiveListenerBus replaces the AsynchronousListenerBus. With dynamic allocation enabled and setting maximum executors to ~2000, I am consistently seeing excessive messages being dropped for an input data size of 300GB. These events are being dropped (UI gets messed up here) because the event queue is not being drained fast enough. >From the thread dumps, the event queue dispatcher freezes up momentarily >during which the queue gets full in a short span and messages are dropped, and >once its active, the queue clears up fast. The race condition happens in >ExecutorAllocationManager because of the synchronization. And the dispatcher >threads waits for the locks to be released. See attached dumps. The remedy for this is two fold: 1 - Decouple the event dispatch and handling of dynamic executor allocation. 2 - Make the listener event queue size configurable. For users who want to run with smaller heartbeat intervals, the no. of events floating around would be large and it would be helpful to have the flexibility to tune this. > Spark UI doesn't show all tasks as completed when it should > --- > > Key: SPARK-15703 > URL: https://issues.apache.org/jira/browse/SPARK-15703 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Priority: Critical > Attachments: Screen Shot 2016-06-01 at 11.21.32 AM.png, Screen Shot > 2016-06-01 at 11.23.48 AM.png, SparkListenerBus .png, > spark-dynamic-executor-allocation.png > > > The Spark UI doesn't seem to be showing all the tasks and metrics. > I ran a job with 10 tasks but Detail stage page says it completed 93029: > Summary Metrics for 93029 Completed Tasks > The Stages for all jobs pages list that only 89519/10 tasks finished but > its completed. The metrics for shuffled write and input are also incorrect. > I will attach screen shots. > I checked the logs and it does show that all the tasks actually finished. > 16/06/01 16:15:42 INFO TaskSetManager: Finished task 59880.0 in stage 2.0 > (TID 54038) in 265309 ms on 10.213.45.51 (10/10) > 16/06/01 16:15:42 INFO YarnClusterScheduler: Removed TaskSet 2.0, whose tasks > have all completed, from pool -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16441) Spark application hang when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384573#comment-15384573 ] Dhruve Ashar commented on SPARK-16441: -- Does the application hang consistently when dynamic allocation is enabled? Or you hit it infrequently? > Spark application hang when dynamic allocation is enabled > - > > Key: SPARK-16441 > URL: https://issues.apache.org/jira/browse/SPARK-16441 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2 > Environment: hadoop 2.7.2 spark1.6.2 >Reporter: cen yuhai > > spark application are waiting for rpc response all the time and spark > listener are blocked by dynamic allocation. Executors can not connect to > driver and lost. > "spark-dynamic-executor-allocation" #239 daemon prio=5 os_prio=0 > tid=0x7fa304438000 nid=0xcec6 waiting on condition [0x7fa2b81e4000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00070fdb94f8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:436) > - locked <0x828a8960> (a > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend) > at > org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1438) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359) > at > org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223) > "SparkListenerBus" #161 daemon prio=5 os_prio=0 tid=0x7fa3053be000 > nid=0xcec9 waiting for monitor entry [0x7fa2b3dfc000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:618) > - waiting to lock <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > at > org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at >
[jira] [Commented] (SPARK-16441) Spark application hang when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384732#comment-15384732 ] Dhruve Ashar commented on SPARK-16441: -- [~cenyuhai] have you started looking at this one? If not then I can look at it. > Spark application hang when dynamic allocation is enabled > - > > Key: SPARK-16441 > URL: https://issues.apache.org/jira/browse/SPARK-16441 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2 > Environment: hadoop 2.7.2 spark1.6.2 >Reporter: cen yuhai > > spark application are waiting for rpc response all the time and spark > listener are blocked by dynamic allocation. Executors can not connect to > driver and lost. > "spark-dynamic-executor-allocation" #239 daemon prio=5 os_prio=0 > tid=0x7fa304438000 nid=0xcec6 waiting on condition [0x7fa2b81e4000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00070fdb94f8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:436) > - locked <0x828a8960> (a > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend) > at > org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1438) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359) > at > org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223) > "SparkListenerBus" #161 daemon prio=5 os_prio=0 tid=0x7fa3053be000 > nid=0xcec9 waiting for monitor entry [0x7fa2b3dfc000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:618) > - waiting to lock <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > at > org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at >
[jira] [Commented] (SPARK-17417) Fix # of partitions for RDD while checkpointing - Currently limited by 10000(%05d)
[ https://issues.apache.org/jira/browse/SPARK-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468622#comment-15468622 ] Dhruve Ashar commented on SPARK-17417: -- Thanks for the suggestion. I'll work on the changes and submit a PR. > Fix # of partitions for RDD while checkpointing - Currently limited by > 1(%05d) > -- > > Key: SPARK-17417 > URL: https://issues.apache.org/jira/browse/SPARK-17417 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Dhruve Ashar > > Spark currently assumes # of partitions to be less than 10 and uses %05d > padding. > If we exceed this no., the sort logic in ReliableCheckpointRDD gets messed up > and fails. This is because of part-files are sorted and compared as strings. > This leads filename order to be part-1, part-10, ... instead of > part-1, part-10001, ..., part-10 and while reconstructing the > checkpointed RDD the job fails. > Possible solutions: > - Bump the padding to allow more partitions or > - Sort the part files extracting a sub-portion as string and then verify the > RDD -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17417) Fix # of partitions for RDD while checkpointing - Currently limited by 10000(%05d)
Dhruve Ashar created SPARK-17417: Summary: Fix # of partitions for RDD while checkpointing - Currently limited by 1(%05d) Key: SPARK-17417 URL: https://issues.apache.org/jira/browse/SPARK-17417 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Dhruve Ashar Spark currently assumes # of partitions to be less than 10 and uses %05d padding. If we exceed this no., the sort logic in ReliableCheckpointRDD gets messed up and fails. This is because of part-files are sorted and compared as strings. This leads filename order to be part-1, part-10, ... instead of part-1, part-10001, ..., part-10 and while reconstructing the checkpointed RDD the job fails. Possible solutions: - Bump the padding to allow more partitions or - Sort the part files extracting a sub-portion as string and then verify the RDD -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17365) Kill multiple executors together to reduce lock contention
Dhruve Ashar created SPARK-17365: Summary: Kill multiple executors together to reduce lock contention Key: SPARK-17365 URL: https://issues.apache.org/jira/browse/SPARK-17365 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Dhruve Ashar To regulate pending and running executors we determine the executors which are eligible to kill and kill them iteratively rather than a loop. This does an RPC call and is synchronized leading to lock contention for SparkListenerBus. Side effect - listener bus is blocked while we iteratively remove executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16441) Spark application hang when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488528#comment-15488528 ] Dhruve Ashar commented on SPARK-16441: -- So all of these are related in one way or the other. I will share my observations made so far: I have taken multiple thread dumps of these and analyzed what is causing the SparkListenerBus to block. The ExecutorAllocationManager is heavily synchronized and its a hotspot for contention especially when schedule is being invoked every 100ms to balance the executors. So for a spark job running thousands of executors, any change in the status of these executors with dynamic allocation enabled leads to frequent firing of events. This causes a contention and leads to blocking. Specifically if calls are remote RPCs. Also the current design of the listener bus is such that it waits for every listener to process the event before it can proceed to deliver the next event from the queue. Any wait for acquiring the locks are leading to the event queue being filling up fast and leads to dropping of events. Logging individual execution times of these do not necessarily conclude to updateAndSync consuming majority of the time and hence we are working on reducing the lock contention and the minimize the RPC calls. Also a parameter to look at would be the heartbeat interval. Having a small interval for a very large no. of executors will aggravate the problem as you would be getting frequent ExecutorMetricsUpdate from the running executors. > Spark application hang when dynamic allocation is enabled > - > > Key: SPARK-16441 > URL: https://issues.apache.org/jira/browse/SPARK-16441 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2, 2.0.0 > Environment: hadoop 2.7.2 spark1.6.2 >Reporter: cen yuhai > > spark application are waiting for rpc response all the time and spark > listener are blocked by dynamic allocation. Executors can not connect to > driver and lost. > "spark-dynamic-executor-allocation" #239 daemon prio=5 os_prio=0 > tid=0x7fa304438000 nid=0xcec6 waiting on condition [0x7fa2b81e4000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00070fdb94f8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:436) > - locked <0x828a8960> (a > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend) > at > org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1438) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359) > at > org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223) > "SparkListenerBus" #161 daemon prio=5 os_prio=0 tid=0x7fa3053be000 > nid=0xcec9 waiting for monitor entry [0x7fa2b3dfc000] >java.lang.Thread.State: BLOCKED (on object monitor) > at >
[jira] [Commented] (SPARK-16441) Spark application hang when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487224#comment-15487224 ] Dhruve Ashar commented on SPARK-16441: -- There was a patch which was recently contributed which reduces the deadlocking. You can check the details here => https://github.com/apache/spark/commit/a0aac4b775bc8c275f96ad0fbf85c9d8a3690588 Also I am working on a patch which improves this further and should be able to complete it soon. Its a WIP and I have a PR up for it [https://github.com/apache/spark/pull/14926]. Let us know if your issue is resolved/minimized with the patch. > Spark application hang when dynamic allocation is enabled > - > > Key: SPARK-16441 > URL: https://issues.apache.org/jira/browse/SPARK-16441 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.2, 2.0.0 > Environment: hadoop 2.7.2 spark1.6.2 >Reporter: cen yuhai > > spark application are waiting for rpc response all the time and spark > listener are blocked by dynamic allocation. Executors can not connect to > driver and lost. > "spark-dynamic-executor-allocation" #239 daemon prio=5 os_prio=0 > tid=0x7fa304438000 nid=0xcec6 waiting on condition [0x7fa2b81e4000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00070fdb94f8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:436) > - locked <0x828a8960> (a > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend) > at > org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1438) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359) > at > org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264) > - locked <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223) > "SparkListenerBus" #161 daemon prio=5 os_prio=0 tid=0x7fa3053be000 > nid=0xcec9 waiting for monitor entry [0x7fa2b3dfc000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:618) > - waiting to lock <0x880e6308> (a > org.apache.spark.ExecutorAllocationManager) > at > org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) > at > org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55) > at > org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) > at > org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80) > at >
[jira] [Commented] (SPARK-17417) Fix # of partitions for RDD while checkpointing - Currently limited by 10000(%05d)
[ https://issues.apache.org/jira/browse/SPARK-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546792#comment-15546792 ] Dhruve Ashar commented on SPARK-17417: -- [~srowen] AFAIU the checkpointing mechanism in spark core, the recovery of an RDD from a checkpoint is limited to an application attempt. Spark streaming mentions that it can recover metadata/rdd from checkpointed data across application attempts. Please correct me if I have missed something here. With this understanding it wouldn't be necessary to parse the code for the old format as the recovery would be done using the same spark jar which was used to launch it. Also why is it that we are not cleaning up the checkpointed directory on sc.close ? > Fix # of partitions for RDD while checkpointing - Currently limited by > 1(%05d) > -- > > Key: SPARK-17417 > URL: https://issues.apache.org/jira/browse/SPARK-17417 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Dhruve Ashar > > Spark currently assumes # of partitions to be less than 10 and uses %05d > padding. > If we exceed this no., the sort logic in ReliableCheckpointRDD gets messed up > and fails. This is because of part-files are sorted and compared as strings. > This leads filename order to be part-1, part-10, ... instead of > part-1, part-10001, ..., part-10 and while reconstructing the > checkpointed RDD the job fails. > Possible solutions: > - Bump the padding to allow more partitions or > - Sort the part files extracting a sub-portion as string and then verify the > RDD -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17417) Fix sorting of part files while reconstructing RDD/partition from checkpointed files.
[ https://issues.apache.org/jira/browse/SPARK-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-17417: - Summary: Fix sorting of part files while reconstructing RDD/partition from checkpointed files. (was: Fix # of partitions for RDD while checkpointing - Currently limited by 1(%05d)) > Fix sorting of part files while reconstructing RDD/partition from > checkpointed files. > - > > Key: SPARK-17417 > URL: https://issues.apache.org/jira/browse/SPARK-17417 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Dhruve Ashar > > Spark currently assumes # of partitions to be less than 10 and uses %05d > padding. > If we exceed this no., the sort logic in ReliableCheckpointRDD gets messed up > and fails. This is because of part-files are sorted and compared as strings. > This leads filename order to be part-1, part-10, ... instead of > part-1, part-10001, ..., part-10 and while reconstructing the > checkpointed RDD the job fails. > Possible solutions: > - Bump the padding to allow more partitions or > - Sort the part files extracting a sub-portion as string and then verify the > RDD -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17649) Log how many Spark events got dropped in LiveListenerBus
[ https://issues.apache.org/jira/browse/SPARK-17649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531080#comment-15531080 ] Dhruve Ashar commented on SPARK-17649: -- I get the idea. While you certainly cannot filter out a specific type of message. You could increase the heartbeat interval to reduce the executor metrics update events to an acceptable degree. This would be significant while running with a large no. of executors. The stats would the effect of it on the event bus. > Log how many Spark events got dropped in LiveListenerBus > > > Key: SPARK-17649 > URL: https://issues.apache.org/jira/browse/SPARK-17649 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Fix For: 1.6.3, 2.0.2, 2.1.0 > > > Log how many Spark events got dropped in LiveListenerBus so that the user can > get insights on how to set a correct event queue size. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17649) Log how many Spark events got dropped in LiveListenerBus
[ https://issues.apache.org/jira/browse/SPARK-17649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526426#comment-15526426 ] Dhruve Ashar commented on SPARK-17649: -- [~zsxwing] Since we are logging the drop event count, it would be beneficial to provide some stats into how many were dropped of a specific type. >From what I have seen is majority of the dropped events are Executor Metrics >update. It would make the log message a bit longer, but the information will >be helpful. We could add it as a debug as well. Let me know your thoughts. > Log how many Spark events got dropped in LiveListenerBus > > > Key: SPARK-17649 > URL: https://issues.apache.org/jira/browse/SPARK-17649 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Fix For: 1.6.3, 2.0.2, 2.1.0 > > > Log how many Spark events got dropped in LiveListenerBus so that the user can > get insights on how to set a correct event queue size. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21460) Spark dynamic allocation breaks when ListenerBus event queue runs full
[ https://issues.apache.org/jira/browse/SPARK-21460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094689#comment-16094689 ] Dhruve Ashar commented on SPARK-21460: -- [~Tagar] Can you attach the driver logs so that it helps in investigating. I am not able to reproduce this issue on my end and would like to check more on this. Also how frequently are you hitting this? > Spark dynamic allocation breaks when ListenerBus event queue runs full > -- > > Key: SPARK-21460 > URL: https://issues.apache.org/jira/browse/SPARK-21460 > Project: Spark > Issue Type: Bug > Components: Scheduler, YARN >Affects Versions: 2.0.0, 2.0.2, 2.1.0, 2.1.1, 2.2.0 > Environment: Spark 2.1 > Hadoop 2.6 >Reporter: Ruslan Dautkhanov >Priority: Critical > Labels: dynamic_allocation, performance, scheduler, yarn > > When ListenerBus event queue runs full, spark dynamic allocation stops > working - Spark fails to shrink number of executors when there are no active > jobs (Spark driver "thinks" there are active jobs since it didn't capture > when they finished) . > ps. What's worse it also makes Spark flood YARN RM with reservation requests, > so YARN preemption doesn't function properly too (we're on Spark 2.1 / Hadoop > 2.6). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21500) Update Shuffle Fetch bookkeeping immediately after receiving remote block
Dhruve Ashar created SPARK-21500: Summary: Update Shuffle Fetch bookkeeping immediately after receiving remote block Key: SPARK-21500 URL: https://issues.apache.org/jira/browse/SPARK-21500 Project: Spark Issue Type: Task Components: Shuffle, Spark Core Affects Versions: 2.2.0 Reporter: Dhruve Ashar Priority: Minor Currently we update any remote shuffle fetch related parameters only when we start processing the FetchResult rather than updating them immediately after receiving them. This would speed up sending requests for remote shuffle fetch. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21243) Limit the number of maps in a single shuffle fetch
Dhruve Ashar created SPARK-21243: Summary: Limit the number of maps in a single shuffle fetch Key: SPARK-21243 URL: https://issues.apache.org/jira/browse/SPARK-21243 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.1.1, 2.1.0 Reporter: Dhruve Ashar Priority: Minor Right now spark can limit the # of parallel fetches and also limits the amount of data in one fetch, but one fetch to a host could be for 100's of blocks. In one instance we saw 450+ blocks. When you have 100's of those and 1000's of reducers fetching that becomes a lot of metadata and can run the Node Manager out of memory. We should add a config to limit the # of maps per fetch to reduce the load on the NM. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20589) Allow limiting task concurrency per stage
[ https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114628#comment-16114628 ] Dhruve Ashar commented on SPARK-20589: -- I have a patch for this and would like to have some feedback/thoughts on this approach. > Allow limiting task concurrency per stage > - > > Key: SPARK-20589 > URL: https://issues.apache.org/jira/browse/SPARK-20589 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.1.0 >Reporter: Thomas Graves > > It would be nice to have the ability to limit the number of concurrent tasks > per stage. This is useful when your spark job might be accessing another > service and you don't want to DOS that service. For instance Spark writing > to hbase or Spark doing http puts on a service. Many times you want to do > this without limiting the number of partitions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20589) Allow limiting task concurrency per stage
[ https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114626#comment-16114626 ] Dhruve Ashar commented on SPARK-20589: -- Spark defines stages based on the shuffle dependencies and the application developer does not have a direct way to control this behavior. Moreover, it would require the application developer to exactly know how the stage boundaries are defined and set the max concurrency with the desired transformation. This would need a change in the API to support this at a transformation level and resolve the max concurrent tasks across tasks in a given stage. Since the scope for this change is much broader and involved, in the meanwhile I would like to propose an approach where we limit the no. of concurrent tasks on a per job basis rather than a per stage basis. This way the developer has control over limiting only a single spark job in his application. This approach can be implemented by following two steps: 1 - Specify the concurrency metric for the job. 2 - Tag the job to be limited in a specific job group. So the application code, looks something like this: {code:java} // limit concurrency for all the job/s under jobGroupId => myjob conf.set("spark.job.myjob.maxConcurrentTasks","10") sc.parallelize(1 to Int.MaxValue, 1).map(x => x + 1).map(x => x - 1).map(x => x * 1).count() // tag the job to be limited under the respective jobGroupId sc.setJobGroup("myjob","",false) sc.parallelize(1 to Int.MaxValue, 1).map(x => x + 1).map(x => x - 1).map(x => x * 1).count() // clear the tag or set it to a different value. sc.clearJobGroup sc.parallelize(1 to Int.MaxValue, 1).map(x => x + 1).map(x => x - 1).map(x => x * 1).count() {code} > Allow limiting task concurrency per stage > - > > Key: SPARK-20589 > URL: https://issues.apache.org/jira/browse/SPARK-20589 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.1.0 >Reporter: Thomas Graves > > It would be nice to have the ability to limit the number of concurrent tasks > per stage. This is useful when your spark job might be accessing another > service and you don't want to DOS that service. For instance Spark writing > to hbase or Spark doing http puts on a service. Many times you want to do > this without limiting the number of partitions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21181) Suppress memory leak errors reported by netty
Dhruve Ashar created SPARK-21181: Summary: Suppress memory leak errors reported by netty Key: SPARK-21181 URL: https://issues.apache.org/jira/browse/SPARK-21181 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 2.1.0 Reporter: Dhruve Ashar Priority: Minor We are seeing netty report memory leak erros like the one below after switching to 2.1. {code} ERROR ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetection.level=advanced' or call ResourceLeakDetector.setLevel() See http://netty.io/wiki/reference-counted-objects.html for more information. {code} Looking a bit deeper, Spark is not leaking any memory here, but it is confusing for the user to see the error message in the driver logs. After enabling, '-Dio.netty.leakDetection.level=advanced', netty reveals the SparkSaslServer to be the source of these leaks. Sample trace :https://gist.github.com/dhruve/b299ebc35aa0a185c244a0468927daf1 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24610) wholeTextFiles broken for small files
Dhruve Ashar created SPARK-24610: Summary: wholeTextFiles broken for small files Key: SPARK-24610 URL: https://issues.apache.org/jira/browse/SPARK-24610 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 2.3.1, 2.2.1 Reporter: Dhruve Ashar Spark is unable to read small files using the wholeTextFiles method when split size related configs are specified - either explicitly or if they are contained in other config files like hive-site.xml. For small sized files, the computed maxSplitSize by `WholeTextFileInputFormat` is way smaller than the default or commonly used split size of 64/128M and spark throws an exception while trying to read them. To reproduce the issue: {code:java} $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client --conf "spark.hadoop.mapreduce.input.fileinputformat.split.minsize.per.node=123456" scala> sc.wholeTextFiles("file:///etc/passwd").count java.io.IOException: Minimum split size pernode 123456 cannot be larger than maximum split size 9962 at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:200) at org.apache.spark.rdd.WholeTextFileRDD.getPartitions(WholeTextFileRDD.scala:50) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2096) at org.apache.spark.rdd.RDD.count(RDD.scala:1158) ... 48 elided // For hdfs sc.wholeTextFiles("smallFile").count java.io.IOException: Minimum split size pernode 123456 cannot be larger than maximum split size 15 at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:200) at org.apache.spark.rdd.WholeTextFileRDD.getPartitions(WholeTextFileRDD.scala:50) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2096) at org.apache.spark.rdd.RDD.count(RDD.scala:1158) ... 48 elided{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25468) Highlight current page index in the history server
Dhruve Ashar created SPARK-25468: Summary: Highlight current page index in the history server Key: SPARK-25468 URL: https://issues.apache.org/jira/browse/SPARK-25468 Project: Spark Issue Type: New Feature Components: Web UI Affects Versions: 2.3.1 Reporter: Dhruve Ashar Spark History Server Web UI should highlight the current page index selected for better navigation. Without it being highlighted it is difficult to identify the current page you are looking at. For example: Page 1 should be highlighted here. !image-2018-09-19-11-29-46-677.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25468) Highlight current page index in the history server
[ https://issues.apache.org/jira/browse/SPARK-25468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-25468: - Description: Spark History Server Web UI should highlight the current page index selected for better navigation. Without it being highlighted it is difficult to identify the current page you are looking at. For example: Page 1 should be highlighted here. was: Spark History Server Web UI should highlight the current page index selected for better navigation. Without it being highlighted it is difficult to identify the current page you are looking at. For example: Page 1 should be highlighted here. !image-2018-09-19-11-29-46-677.png! > Highlight current page index in the history server > -- > > Key: SPARK-25468 > URL: https://issues.apache.org/jira/browse/SPARK-25468 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 2.3.1 >Reporter: Dhruve Ashar >Priority: Trivial > > Spark History Server Web UI should highlight the current page index selected > for better navigation. Without it being highlighted it is difficult to > identify the current page you are looking at. > > For example: Page 1 should be highlighted here. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25468) Highlight current page index in the history server
[ https://issues.apache.org/jira/browse/SPARK-25468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-25468: - Description: Spark History Server Web UI should highlight the current page index selected for better navigation. Without it being highlighted it is difficult to identify the current page you are looking at. For example: Page 1 should be highlighted as show in SparkHistoryServer.png was: Spark History Server Web UI should highlight the current page index selected for better navigation. Without it being highlighted it is difficult to identify the current page you are looking at. For example: Page 1 should be highlighted here. > Highlight current page index in the history server > -- > > Key: SPARK-25468 > URL: https://issues.apache.org/jira/browse/SPARK-25468 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 2.3.1 >Reporter: Dhruve Ashar >Priority: Trivial > Attachments: SparkHistoryServer.png > > > Spark History Server Web UI should highlight the current page index selected > for better navigation. Without it being highlighted it is difficult to > identify the current page you are looking at. > > For example: Page 1 should be highlighted as show in SparkHistoryServer.png -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
***UNCHECKED*** [jira] [Updated] (SPARK-25468) Highlight current page index in the history server
[ https://issues.apache.org/jira/browse/SPARK-25468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-25468: - Attachment: SparkHistoryServer.png > Highlight current page index in the history server > -- > > Key: SPARK-25468 > URL: https://issues.apache.org/jira/browse/SPARK-25468 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 2.3.1 >Reporter: Dhruve Ashar >Priority: Trivial > Attachments: SparkHistoryServer.png > > > Spark History Server Web UI should highlight the current page index selected > for better navigation. Without it being highlighted it is difficult to > identify the current page you are looking at. > > For example: Page 1 should be highlighted here. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
Dhruve Ashar created SPARK-27107: Summary: Spark SQL Job failing because of Kryo buffer overflow with ORC Key: SPARK-27107 URL: https://issues.apache.org/jira/browse/SPARK-27107 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0, 2.3.2 Reporter: Dhruve Ashar The issue occurs while trying to read ORC data and setting the SearchArgument. {code:java} Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 9 Serialization trace: literalList (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) at com.esotericsoftware.kryo.io.Output.require(Output.java:163) at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) at com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) at com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) at org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) at org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) at scala.Option.foreach(Option.scala:257) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) at org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371)
[jira] [Commented] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788021#comment-16788021 ] Dhruve Ashar commented on SPARK-27107: -- [~dongjoon] can you review the PR for ORC to fix this issue? [https://github.com/apache/orc/pull/372] Once this is merged, we can fix the issue in spark as well. Until then the only workaround is to use the hive based implementation. > Spark SQL Job failing because of Kryo buffer overflow with ORC > -- > > Key: SPARK-27107 > URL: https://issues.apache.org/jira/browse/SPARK-27107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > The issue occurs while trying to read ORC data and setting the SearchArgument. > {code:java} > Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. > Available: 0, required: 9 > Serialization trace: > literalList > (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) > leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) > at com.esotericsoftware.kryo.io.Output.require(Output.java:163) > at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) > at > org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) > at > org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) >
[jira] [Commented] (SPARK-27112) Spark Scheduler encounters two independent Deadlocks when trying to kill executors either due to dynamic allocation or blacklisting
[ https://issues.apache.org/jira/browse/SPARK-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797253#comment-16797253 ] Dhruve Ashar commented on SPARK-27112: -- [~irashid] - I think this is a critical bug and since it is resolved we should include it in the rc8. > Spark Scheduler encounters two independent Deadlocks when trying to kill > executors either due to dynamic allocation or blacklisting > > > Key: SPARK-27112 > URL: https://issues.apache.org/jira/browse/SPARK-27112 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 2.4.0, 3.0.0 >Reporter: Parth Gandhi >Assignee: Parth Gandhi >Priority: Major > Fix For: 2.3.4, 2.4.2, 3.0.0 > > Attachments: Screen Shot 2019-02-26 at 4.10.26 PM.png, Screen Shot > 2019-02-26 at 4.10.48 PM.png, Screen Shot 2019-02-26 at 4.11.11 PM.png, > Screen Shot 2019-02-26 at 4.11.26 PM.png > > > Recently, a few spark users in the organization have reported that their jobs > were getting stuck. On further analysis, it was found out that there exist > two independent deadlocks and either of them occur under different > circumstances. The screenshots for these two deadlocks are attached here. > We were able to reproduce the deadlocks with the following piece of code: > > {code:java} > import org.apache.hadoop.conf.Configuration > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.spark._ > import org.apache.spark.TaskContext > // Simple example of Word Count in Scala > object ScalaWordCount { > def main(args: Array[String]) { > if (args.length < 2) { > System.err.println("Usage: ScalaWordCount ") > System.exit(1) > } > val conf = new SparkConf().setAppName("Scala Word Count") > val sc = new SparkContext(conf) > // get the input file uri > val inputFilesUri = args(0) > // get the output file uri > val outputFilesUri = args(1) > while (true) { > val textFile = sc.textFile(inputFilesUri) > val counts = textFile.flatMap(line => line.split(" ")) > .map(word => {if (TaskContext.get.partitionId == 5 && > TaskContext.get.attemptNumber == 0) throw new Exception("Fail for > blacklisting") else (word, 1)}) > .reduceByKey(_ + _) > counts.saveAsTextFile(outputFilesUri) > val conf: Configuration = new Configuration() > val path: Path = new Path(outputFilesUri) > val hdfs: FileSystem = FileSystem.get(conf) > hdfs.delete(path, true) > } > sc.stop() > } > } > {code} > > Additionally, to ensure that the deadlock surfaces up soon enough, I also > added a small delay in the Spark code here: > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala#L256] > > {code:java} > executorIdToFailureList.remove(exec) > updateNextExpiryTime() > Thread.sleep(2000) > killBlacklistedExecutor(exec) > {code} > > Also make sure that the following configs are set when launching the above > spark job: > *spark.blacklist.enabled=true* > *spark.blacklist.killBlacklistedExecutors=true* > *spark.blacklist.application.maxFailedTasksPerExecutor=1* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar updated SPARK-27107: - Description: The issue occurs while trying to read ORC data and setting the SearchArgument. {code:java} Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 9 Serialization trace: literalList (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) at com.esotericsoftware.kryo.io.Output.require(Output.java:163) at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) at com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) at com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) at org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) at org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) at scala.Option.foreach(Option.scala:257) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) at org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at
[jira] [Commented] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789668#comment-16789668 ] Dhruve Ashar commented on SPARK-27107: -- This is something we can consistently reproduce every single time. I do not have an example with company details stripped off that I can share for this use case at this point. Is there anything specific that you are looking for? I can try to come up with a reproducible case, but it seems to be difficult to reproduce as this is dependent on user query and the parameters that are being passed to filter the data. {quote} Could you provide a reproducible test case here? {quote} The Hive default is 4K and not 4M (that was a typo). Thanks for correcting that. {quote}BTW, the Hive default is 4K instead of 4M, isn't it? {quote} Yes. The hive implementation should fail when it exceeds the 10M limit for a SArg and the PR that I have against the Orc implementation tries to make this configurable so that spark can control the buffer size if we hit a buffer overflow error. {quote}Technically, Hive implementation also fails when it exceeds the limitation because it's a non-configurable parameter issue. {quote} > Spark SQL Job failing because of Kryo buffer overflow with ORC > -- > > Key: SPARK-27107 > URL: https://issues.apache.org/jira/browse/SPARK-27107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > The issue occurs while trying to read ORC data and setting the SearchArgument. > {code:java} > Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. > Available: 0, required: 9 > Serialization trace: > literalList > (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) > leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) > at com.esotericsoftware.kryo.io.Output.require(Output.java:163) > at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) > at > org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) > at > org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) > at >
[jira] [Commented] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790610#comment-16790610 ] Dhruve Ashar commented on SPARK-27107: -- Update: The PR was merged in the orc repository. My understanding is that we should update our pom once a new orc release is cut out. > Spark SQL Job failing because of Kryo buffer overflow with ORC > -- > > Key: SPARK-27107 > URL: https://issues.apache.org/jira/browse/SPARK-27107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > The issue occurs while trying to read ORC data and setting the SearchArgument. > {code:java} > Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. > Available: 0, required: 9 > Serialization trace: > literalList > (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) > leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) > at com.esotericsoftware.kryo.io.Output.require(Output.java:163) > at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) > at > org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) > at > org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) >
[jira] [Commented] (SPARK-27107) Spark SQL Job failing because of Kryo buffer overflow with ORC
[ https://issues.apache.org/jira/browse/SPARK-27107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790911#comment-16790911 ] Dhruve Ashar commented on SPARK-27107: -- I verified the changes and we are no longer seeing the issue. Thanks for testing+voting the ORC RC. I think I am not on the ORC mailing list, so I might have missed the voting. > Spark SQL Job failing because of Kryo buffer overflow with ORC > -- > > Key: SPARK-27107 > URL: https://issues.apache.org/jira/browse/SPARK-27107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > The issue occurs while trying to read ORC data and setting the SearchArgument. > {code:java} > Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. > Available: 0, required: 9 > Serialization trace: > literalList > (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl) > leaves (org.apache.orc.storage.ql.io.sarg.SearchArgumentImpl) > at com.esotericsoftware.kryo.io.Output.require(Output.java:163) > at com.esotericsoftware.kryo.io.Output.writeVarLong(Output.java:614) > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:538) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:147) > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.write(DefaultSerializers.java:141) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) > at > org.apache.orc.mapred.OrcInputFormat.setSearchArgument(OrcInputFormat.java:96) > at > org.apache.orc.mapreduce.OrcInputFormat.setSearchArgument(OrcInputFormat.java:57) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:159) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(OrcFileFormat.scala:156) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.buildReaderWithPartitionValues(OrcFileFormat.scala:156) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:297) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:295) > at > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:315) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.python.EvalPythonExec.doExecute(EvalPythonExec.scala:89) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at >
[jira] [Created] (SPARK-26801) Spark unable to read valid avro types
Dhruve Ashar created SPARK-26801: Summary: Spark unable to read valid avro types Key: SPARK-26801 URL: https://issues.apache.org/jira/browse/SPARK-26801 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Dhruve Ashar Currently the external avro package reads avro schemas for type records only. This is probably because of representation of InternalRow in spark sql. As a result, if the avro file has anything other than a sequence of records it fails to read it. We faced this issue earlier while trying to read primitive types. We encountered this again while trying to read an array of records. Below are code examples trying to read valid avro data showing the stack traces. {code:java} spark.read.format("avro").load("avroTypes/randomInt.avro").show java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType: "int" at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) at scala.Option.orElse(Option.scala:289) at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) ... 49 elided == scala> spark.read.format("avro").load("avroTypes/randomEnum.avro").show java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType: { "type" : "enum", "name" : "Suit", "symbols" : [ "SPADES", "HEARTS", "DIAMONDS", "CLUBS" ] } at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) at scala.Option.orElse(Option.scala:289) at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) ... 49 elided {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26801) Spark unable to read valid avro types
[ https://issues.apache.org/jira/browse/SPARK-26801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760178#comment-16760178 ] Dhruve Ashar commented on SPARK-26801: -- I have given a short summary of the issue in the PR description. It would be great if you could review it [~hyukjin.kwon] Thanks. > Spark unable to read valid avro types > - > > Key: SPARK-26801 > URL: https://issues.apache.org/jira/browse/SPARK-26801 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > Currently the external avro package reads avro schemas for type records only. > This is probably because of representation of InternalRow in spark sql. As a > result, if the avro file has anything other than a sequence of records it > fails to read it. > We faced this issue earlier while trying to read primitive types. We > encountered this again while trying to read an array of records. Below are > code examples trying to read valid avro data showing the stack traces. > {code:java} > spark.read.format("avro").load("avroTypes/randomInt.avro").show > java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL > StructType: > "int" > at > org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) > at scala.Option.orElse(Option.scala:289) > at > org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > ... 49 elided > == > scala> spark.read.format("avro").load("avroTypes/randomEnum.avro").show > java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL > StructType: > { > "type" : "enum", > "name" : "Suit", > "symbols" : [ "SPADES", "HEARTS", "DIAMONDS", "CLUBS" ] > } > at > org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) > at scala.Option.orElse(Option.scala:289) > at > org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > ... 49 elided > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26827) Support importing python modules having shared objects(.so)
[ https://issues.apache.org/jira/browse/SPARK-26827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761072#comment-16761072 ] Dhruve Ashar edited comment on SPARK-26827 at 2/5/19 6:08 PM: -- Resolution : Pass the same archive with py-files and archives option. was (Author: dhruve ashar): Pass the same archive with py-files and archives option. > Support importing python modules having shared objects(.so) > --- > > Key: SPARK-26827 > URL: https://issues.apache.org/jira/browse/SPARK-26827 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > If a user wants to import dynamic modules, specifically having .so files, > this is currently disallowed by python from a zip file. > ([https://docs.python.org/3/library/zipimport.html)] and currently spark > doesn't support this either. > Files which are passed using py-files options are placed on the PYTHONPATH, > but are not extracted. While files which are passed as archives, are > extracted but not placed on the PYTHONPATH. The dynamic modules can be loaded > if they are extracted and added to the PYTHONPATH. > > Has anyone encountered this issue before and what is the best way to go about > it? > > Some possible solutions: > 1 - Get around this issue, by passing the archive with py-files and archives > option, this extracts the archive as well as adds it to the path. Gotcha - > both have to be named the same. I have tested this and it works, but its just > a workaround. > 2 - We add a new config like py-archives which takes all the files and > extracts them and also adds them to the PYTHONPATH. Or just examine the > contents of the zip file and if it has dynamic modules then do the same. I am > happy to work on the fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26827) Support importing python modules having shared objects(.so)
[ https://issues.apache.org/jira/browse/SPARK-26827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruve Ashar resolved SPARK-26827. -- Resolution: Workaround Pass the same archive with py-files and archives option. > Support importing python modules having shared objects(.so) > --- > > Key: SPARK-26827 > URL: https://issues.apache.org/jira/browse/SPARK-26827 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > If a user wants to import dynamic modules, specifically having .so files, > this is currently disallowed by python from a zip file. > ([https://docs.python.org/3/library/zipimport.html)] and currently spark > doesn't support this either. > Files which are passed using py-files options are placed on the PYTHONPATH, > but are not extracted. While files which are passed as archives, are > extracted but not placed on the PYTHONPATH. The dynamic modules can be loaded > if they are extracted and added to the PYTHONPATH. > > Has anyone encountered this issue before and what is the best way to go about > it? > > Some possible solutions: > 1 - Get around this issue, by passing the archive with py-files and archives > option, this extracts the archive as well as adds it to the path. Gotcha - > both have to be named the same. I have tested this and it works, but its just > a workaround. > 2 - We add a new config like py-archives which takes all the files and > extracts them and also adds them to the PYTHONPATH. Or just examine the > contents of the zip file and if it has dynamic modules then do the same. I am > happy to work on the fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26827) Support importing python modules having shared objects(.so)
[ https://issues.apache.org/jira/browse/SPARK-26827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761034#comment-16761034 ] Dhruve Ashar commented on SPARK-26827: -- [~holden.ka...@gmail.com] , [~irashid] any thoughts on this one? > Support importing python modules having shared objects(.so) > --- > > Key: SPARK-26827 > URL: https://issues.apache.org/jira/browse/SPARK-26827 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > If a user wants to import dynamic modules, specifically having .so files, > this is currently disallowed by python from a zip file. > ([https://docs.python.org/3/library/zipimport.html)] and currently spark > doesn't support this either. > Files which are passed using py-files options are placed on the PYTHONPATH, > but are not extracted. While files which are passed as archives, are > extracted but not placed on the PYTHONPATH. The dynamic modules can be loaded > if they are extracted and added to the PYTHONPATH. > > Has anyone encountered this issue before and what is the best way to go about > it? > > Some possible solutions: > 1 - Get around this issue, by passing the archive with py-files and archives > option, this extracts the archive as well as adds it to the path. Gotcha - > both have to be named the same. I have tested this and it works, but its just > a workaround. > 2 - We add a new config like py-archives which takes all the files and > extracts them and also adds them to the PYTHONPATH. Or just examine the > contents of the zip file and if it has dynamic modules then do the same. I am > happy to work on the fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26827) Support importing python modules having shared objects(.so)
Dhruve Ashar created SPARK-26827: Summary: Support importing python modules having shared objects(.so) Key: SPARK-26827 URL: https://issues.apache.org/jira/browse/SPARK-26827 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 2.4.0, 2.3.2 Reporter: Dhruve Ashar If a user wants to import dynamic modules, specifically having .so files, this is currently disallowed by python from a zip file. ([https://docs.python.org/3/library/zipimport.html)] and currently spark doesn't support this either. Files which are passed using py-files options are placed on the PYTHONPATH, but are not extracted. While files which are passed as archives, are extracted but not placed on the PYTHONPATH. The dynamic modules can be loaded if they are extracted and added to the PYTHONPATH. Has anyone encountered this issue before and what is the best way to go about it? Some possible solutions: 1 - Get around this issue, by passing the archive with py-files and archives option, this extracts the archive as well as adds it to the path. Gotcha - both have to be named the same. I have tested this and it works, but its just a workaround. 2 - We add a new config like py-archives which takes all the files and extracts them and also adds them to the PYTHONPATH. Or just examine the contents of the zip file and if it has dynamic modules then do the same. I am happy to work on the fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26827) Support importing python modules having shared objects(.so)
[ https://issues.apache.org/jira/browse/SPARK-26827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761069#comment-16761069 ] Dhruve Ashar commented on SPARK-26827: -- Thanks for the response [~irashid] and [~hyukjin.kwon]. Will close this one out. > Support importing python modules having shared objects(.so) > --- > > Key: SPARK-26827 > URL: https://issues.apache.org/jira/browse/SPARK-26827 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 2.3.2, 2.4.0 >Reporter: Dhruve Ashar >Priority: Major > > If a user wants to import dynamic modules, specifically having .so files, > this is currently disallowed by python from a zip file. > ([https://docs.python.org/3/library/zipimport.html)] and currently spark > doesn't support this either. > Files which are passed using py-files options are placed on the PYTHONPATH, > but are not extracted. While files which are passed as archives, are > extracted but not placed on the PYTHONPATH. The dynamic modules can be loaded > if they are extracted and added to the PYTHONPATH. > > Has anyone encountered this issue before and what is the best way to go about > it? > > Some possible solutions: > 1 - Get around this issue, by passing the archive with py-files and archives > option, this extracts the archive as well as adds it to the path. Gotcha - > both have to be named the same. I have tested this and it works, but its just > a workaround. > 2 - We add a new config like py-archives which takes all the files and > extracts them and also adds them to the PYTHONPATH. Or just examine the > contents of the zip file and if it has dynamic modules then do the same. I am > happy to work on the fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24149) Automatic namespaces discovery in HDFS federation
[ https://issues.apache.org/jira/browse/SPARK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847929#comment-16847929 ] Dhruve Ashar commented on SPARK-24149: -- I also think that HDFS client shouldn't be figuring out if HA is enabled and then add both the NNs to the list. > Automatic namespaces discovery in HDFS federation > - > > Key: SPARK-24149 > URL: https://issues.apache.org/jira/browse/SPARK-24149 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.4.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Hadoop 3 introduced HDFS federation. > Spark fails to write on different namespaces when Hadoop federation is turned > on and the cluster is secure. This happens because Spark looks for the > delegation token only for the defaultFS configured and not for all the > available namespaces. A workaround is the usage of the property > {{spark.yarn.access.hadoopFileSystems}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24149) Automatic namespaces discovery in HDFS federation
[ https://issues.apache.org/jira/browse/SPARK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847923#comment-16847923 ] Dhruve Ashar commented on SPARK-24149: -- Spark should be agnostic about the namenodes for which it should get the tokens. It is either hdfs which figures this out or the user who should specify this explicitly. You should get tokens for only those namenodes which you are going to access. If the namespaces are related, viewfs does this for you. If they are unrelated the user explicitly provides them. Unrelated namespaces may or may not use HDFS federation and there can also be more namespaces in the federation which a user job might not access at all. > Automatic namespaces discovery in HDFS federation > - > > Key: SPARK-24149 > URL: https://issues.apache.org/jira/browse/SPARK-24149 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.4.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Hadoop 3 introduced HDFS federation. > Spark fails to write on different namespaces when Hadoop federation is turned > on and the cluster is secure. This happens because Spark looks for the > delegation token only for the defaultFS configured and not for all the > available namespaces. A workaround is the usage of the property > {{spark.yarn.access.hadoopFileSystems}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24149) Automatic namespaces discovery in HDFS federation
[ https://issues.apache.org/jira/browse/SPARK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847854#comment-16847854 ] Dhruve Ashar commented on SPARK-24149: -- [~mgaido] [~vanzin], The change this PR introduced is trying to explicitly figure out the list of namenodes from the hadoop configs. I think we are duplicating the logic here and this makes it confusing to understand as the client should be transparent to figuring out the necessary namenodes. Rationale: - HDFS Federation is used to store the data from two different namespaces on the same data node (mostly used with unrelated namespaces). - ViewFS on the other hand is used for better namespace management by having different namespaces on different namenodes. But in that case, you should always be using it with viewfs:// which takes care of getting the tokens for you. (Note: this may use HDFS federation or may be not). In either case we should rely on hadoop to give us the requested namenodes. In the use case where we want to access unrelated namespaces (often used in scenarios where different hive tables are stored in different namespaces), we already have a config to pass in the other namenodes and we really don't need this change. There was a follow-up PR to fix an issue because of this behavior to get the FS only for the specified namenodes. IMHO both of these changes are unnecessary and we should revert them to the original behavior. Thoughts, comments? > Automatic namespaces discovery in HDFS federation > - > > Key: SPARK-24149 > URL: https://issues.apache.org/jira/browse/SPARK-24149 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.4.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Hadoop 3 introduced HDFS federation. > Spark fails to write on different namespaces when Hadoop federation is turned > on and the cluster is secure. This happens because Spark looks for the > delegation token only for the defaultFS configured and not for all the > available namespaces. A workaround is the usage of the property > {{spark.yarn.access.hadoopFileSystems}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24149) Automatic namespaces discovery in HDFS federation
[ https://issues.apache.org/jira/browse/SPARK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850096#comment-16850096 ] Dhruve Ashar commented on SPARK-24149: -- Thanks for the missing context. The current behavior doesn't seem to address the use case mentioned. We could have a table which is partitioned across a different namespace which is on a different cluster (NN). In this case the user has to know about the underlying namespace and their corresponding NN to get the tokens. The current logic seems to address only a specific use case where a table is stored across multiple namespaces (configured without viewfs) and in this case they luckily happen to be on the same cluster (using HDFS federation). What if these are on a different cluster? I would expect that if data for a given table is to be stored across different namespaces, then these namespaces be related and addressed using viewfs. This has the advantage of getting the tokens for all the NN the data resides on irrespective of the use case if the partitions happen to reside on the same or a different cluster and is much better from a user transparency standpoint as well, since it covers all the use cases. > Automatic namespaces discovery in HDFS federation > - > > Key: SPARK-24149 > URL: https://issues.apache.org/jira/browse/SPARK-24149 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.4.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Hadoop 3 introduced HDFS federation. > Spark fails to write on different namespaces when Hadoop federation is turned > on and the cluster is secure. This happens because Spark looks for the > delegation token only for the defaultFS configured and not for all the > available namespaces. A workaround is the usage of the property > {{spark.yarn.access.hadoopFileSystems}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24149) Automatic namespaces discovery in HDFS federation
[ https://issues.apache.org/jira/browse/SPARK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852365#comment-16852365 ] Dhruve Ashar commented on SPARK-24149: -- IMHO having a consistent way to reason about a behavior is more preferable over specifying an additional config. We encountered an issue while trying to get the filesystem using a logical nameservice (That is one more reason why this is broken) based on => [https://github.com/apache/spark/pull/21216/files#diff-f8659513cf91c15097428c3d8dfbcc35R213] > Automatic namespaces discovery in HDFS federation > - > > Key: SPARK-24149 > URL: https://issues.apache.org/jira/browse/SPARK-24149 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.4.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Hadoop 3 introduced HDFS federation. > Spark fails to write on different namespaces when Hadoop federation is turned > on and the cluster is secure. This happens because Spark looks for the > delegation token only for the defaultFS configured and not for all the > available namespaces. A workaround is the usage of the property > {{spark.yarn.access.hadoopFileSystems}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27937) Revert changes introduced as a part of Automatic namespace discovery [SPARK-24149]
[ https://issues.apache.org/jira/browse/SPARK-27937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859001#comment-16859001 ] Dhruve Ashar commented on SPARK-27937: -- The exception that we started encountering is while spark tries to create a path of the logic nameservice or nameservice id configured as a part of HDFS federation. {code:java} 19/05/20 08:48:42 INFO SecurityManager: Changing modify acls groups to: 19/05/20 08:48:42 INFO SecurityManager: SecurityManager: authentication enabled; ui acls enabled; users with view permissions: Set(...); groups with view permissions: Set(); users with modify permissions: Set(); groups with modify permissions: Set(.) 19/05/20 08:48:43 INFO Client: Deleted staging directory hdfs://..:8020/user/abc/.sparkStaging/application_123456_123456 Exception in thread "main" java.io.IOException: Cannot create proxy with unresolved address: abcabcabc-nn1:8020 at org.apache.hadoop.hdfs.NameNodeProxiesClient.createNonHAProxyWithClientProtocol(NameNodeProxiesClient.java:345) at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:133) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:351) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:285) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2821) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:100) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2892) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2874) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$5$$anonfun$apply$2.apply(YarnSparkHadoopUtil.scala:215) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$5$$anonfun$apply$2.apply(YarnSparkHadoopUtil.scala:214) at scala.Option.map(Option.scala:146) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$5.apply(YarnSparkHadoopUtil.scala:214) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$5.apply(YarnSparkHadoopUtil.scala:213) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.hadoopFSsToAccess(YarnSparkHadoopUtil.scala:213) at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager$$anonfun$1.apply(YARNHadoopDelegationTokenManager.scala:43) at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager$$anonfun$1.apply(YARNHadoopDelegationTokenManager.scala:43) at org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.obtainDelegationTokens(HadoopFSDelegationTokenProvider.scala:48) {code} > Revert changes introduced as a part of Automatic namespace discovery > [SPARK-24149] > -- > > Key: SPARK-27937 > URL: https://issues.apache.org/jira/browse/SPARK-27937 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Dhruve Ashar >Priority: Major > > Spark fails to launch for a valid deployment of HDFS while trying to get > tokens for a logical nameservice instead of an actual namenode (with HDFS > federation enabled). > On inspecting the source code closely, it is unclear why we were doing it and > based on the context from SPARK-24149, it solves a very specific use case of > getting the tokens for only those namenodes which are configured for HDFS > federation in the same cluster. IMHO these are better left to the user to > specify explicitly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27937) Revert changes introduced as a part of Automatic namespace discovery [SPARK-24149]
[ https://issues.apache.org/jira/browse/SPARK-27937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859001#comment-16859001 ] Dhruve Ashar edited comment on SPARK-27937 at 6/7/19 9:27 PM: -- The exception that we started encountering is while spark tries to create a path of the logic nameservice or nameservice id configured as a part of HDFS federation as a part of the code here: https://github.com/apache/spark/blob/v2.4.3/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L215 {code:java} 19/05/20 08:48:42 INFO SecurityManager: Changing modify acls groups to: 19/05/20 08:48:42 INFO SecurityManager: SecurityManager: authentication enabled; ui acls enabled; users with view permissions: Set(...); groups with view permissions: Set(); users with modify permissions: Set(); groups with modify permissions: Set(.) 19/05/20 08:48:43 INFO Client: Deleted staging directory hdfs://..:8020/user/abc/.sparkStaging/application_123456_123456 Exception in thread "main" java.io.IOException: Cannot create proxy with unresolved address: abcabcabc-nn1:8020 at org.apache.hadoop.hdfs.NameNodeProxiesClient.createNonHAProxyWithClientProtocol(NameNodeProxiesClient.java:345) at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:133) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:351) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:285) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2821) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:100) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2892) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2874) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$5$$anonfun$apply$2.apply(YarnSparkHadoopUtil.scala:215) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$5$$anonfun$apply$2.apply(YarnSparkHadoopUtil.scala:214) at scala.Option.map(Option.scala:146) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$5.apply(YarnSparkHadoopUtil.scala:214) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$5.apply(YarnSparkHadoopUtil.scala:213) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.hadoopFSsToAccess(YarnSparkHadoopUtil.scala:213) at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager$$anonfun$1.apply(YARNHadoopDelegationTokenManager.scala:43) at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager$$anonfun$1.apply(YARNHadoopDelegationTokenManager.scala:43) at org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.obtainDelegationTokens(HadoopFSDelegationTokenProvider.scala:48) {code} was (Author: dhruve ashar): The exception that we started encountering is while spark tries to create a path of the logic nameservice or nameservice id configured as a part of HDFS federation. {code:java} 19/05/20 08:48:42 INFO SecurityManager: Changing modify acls groups to: 19/05/20 08:48:42 INFO SecurityManager: SecurityManager: authentication enabled; ui acls enabled; users with view permissions: Set(...); groups with view permissions: Set(); users with modify permissions: Set(); groups with modify permissions: Set(.) 19/05/20 08:48:43 INFO Client: Deleted staging directory hdfs://..:8020/user/abc/.sparkStaging/application_123456_123456 Exception in thread "main" java.io.IOException: Cannot create proxy with unresolved address: abcabcabc-nn1:8020 at org.apache.hadoop.hdfs.NameNodeProxiesClient.createNonHAProxyWithClientProtocol(NameNodeProxiesClient.java:345) at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:133) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:351) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:285) at
[jira] [Created] (SPARK-27937) Revert changes introduced as a part of Automatic namespace discovery [SPARK-24149]
Dhruve Ashar created SPARK-27937: Summary: Revert changes introduced as a part of Automatic namespace discovery [SPARK-24149] Key: SPARK-27937 URL: https://issues.apache.org/jira/browse/SPARK-27937 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.3 Reporter: Dhruve Ashar Spark fails to launch for a valid deployment of HDFS while trying to get tokens for a logical nameservice instead of an actual namenode (with HDFS federation enabled). On inspecting the source code closely, it is unclear why we were doing it and based on the context from SPARK-24149, it solves a very specific use case of getting the tokens for only those namenodes which are configured for HDFS federation in the same cluster. IMHO these are better left to the user to specify explicitly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org