[jira] [Issue Comment Deleted] (SPARK-19300) Executor is waiting for lock
[ https://issues.apache.org/jira/browse/SPARK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19300: Comment: was deleted (was: [~zsxwing] I also met this issue, and you can look https://issues.apache.org/jira/browse/SPARK-19883. And my netty's version already is 4.0.43.Final. ) > Executor is waiting for lock > > > Key: SPARK-19300 > URL: https://issues.apache.org/jira/browse/SPARK-19300 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: cen yuhai >Priority: Critical > Attachments: stderr.jpg, WAITING.jpg > > > I can see all threads in the executor is waiting for lock. > {code} > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:313) > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54) > scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:396) > org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:138) > org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215) > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957) > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948) > org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888) > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694) > org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:342) > org.apache.spark.rdd.RDD.iterator(RDD.scala:293) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > org.apache.spark.scheduler.Task.run(Task.scala:99) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:330) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SPARK-19300) Executor is waiting for lock
[ https://issues.apache.org/jira/browse/SPARK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902988#comment-15902988 ] hustfxj commented on SPARK-19300: - [~zsxwing] I also met this issue, and you can look https://issues.apache.org/jira/browse/SPARK-19883. And my netty's version already is 4.0.43.Final. > Executor is waiting for lock > > > Key: SPARK-19300 > URL: https://issues.apache.org/jira/browse/SPARK-19300 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: cen yuhai >Priority: Critical > Attachments: stderr.jpg, WAITING.jpg > > > I can see all threads in the executor is waiting for lock. > {code} > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:313) > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54) > scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:396) > org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:138) > org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215) > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957) > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948) > org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888) > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694) > org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:342) > org.apache.spark.rdd.RDD.iterator(RDD.scala:293) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > org.apache.spark.scheduler.Task.run(Task.scala:99) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:330) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (SPARK-19829) The log about driver should support rolling like executor
[ https://issues.apache.org/jira/browse/SPARK-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj closed SPARK-19829. --- Resolution: Invalid > The log about driver should support rolling like executor > - > > Key: SPARK-19829 > URL: https://issues.apache.org/jira/browse/SPARK-19829 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: hustfxj >Priority: Minor > > We should rollback the log of the driver , or the log maybe large!!! > {code:title=DriverRunner.java|borderStyle=solid} > // modify the runDriver > private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: > Boolean): Int = { > builder.directory(baseDir) > def initialize(process: Process): Unit = { > // Redirect stdout and stderr to files-- the old code > // val stdout = new File(baseDir, "stdout") > // CommandUtils.redirectStream(process.getInputStream, stdout) > // > // val stderr = new File(baseDir, "stderr") > // val formattedCommand = builder.command.asScala.mkString("\"", "\" > \"", "\"") > // val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, > "=" * 40) > // Files.append(header, stderr, StandardCharsets.UTF_8) > // CommandUtils.redirectStream(process.getErrorStream, stderr) > // Redirect its stdout and stderr to files-support rolling > val stdout = new File(baseDir, "stdout") > stdoutAppender = FileAppender(process.getInputStream, stdout, conf) > val stderr = new File(baseDir, "stderr") > val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", > "\"") > val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" > * 40) > Files.append(header, stderr, StandardCharsets.UTF_8) > stderrAppender = FileAppender(process.getErrorStream, stderr, conf) > } > runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19829) The log about driver should support rolling like executor
[ https://issues.apache.org/jira/browse/SPARK-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902982#comment-15902982 ] hustfxj commented on SPARK-19829: - ok. > The log about driver should support rolling like executor > - > > Key: SPARK-19829 > URL: https://issues.apache.org/jira/browse/SPARK-19829 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: hustfxj >Priority: Minor > > We should rollback the log of the driver , or the log maybe large!!! > {code:title=DriverRunner.java|borderStyle=solid} > // modify the runDriver > private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: > Boolean): Int = { > builder.directory(baseDir) > def initialize(process: Process): Unit = { > // Redirect stdout and stderr to files-- the old code > // val stdout = new File(baseDir, "stdout") > // CommandUtils.redirectStream(process.getInputStream, stdout) > // > // val stderr = new File(baseDir, "stderr") > // val formattedCommand = builder.command.asScala.mkString("\"", "\" > \"", "\"") > // val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, > "=" * 40) > // Files.append(header, stderr, StandardCharsets.UTF_8) > // CommandUtils.redirectStream(process.getErrorStream, stderr) > // Redirect its stdout and stderr to files-support rolling > val stdout = new File(baseDir, "stdout") > stdoutAppender = FileAppender(process.getInputStream, stdout, conf) > val stderr = new File(baseDir, "stderr") > val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", > "\"") > val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" > * 40) > Files.append(header, stderr, StandardCharsets.UTF_8) > stderrAppender = FileAppender(process.getErrorStream, stderr, conf) > } > runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19883) Executor is waiting for lock
[ https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902980#comment-15902980 ] hustfxj commented on SPARK-19883: - [~srowen] Saw this issue again today. So spark may not solve this issue. > Executor is waiting for lock > > > Key: SPARK-19883 > URL: https://issues.apache.org/jira/browse/SPARK-19883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: hustfxj > Attachments: jstack > > > {code:borderStyle=solid} > "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 > tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000498c249c0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > at org.apache.spark.scheduler.Task.run(Task.scala:114) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19883) Executor is waiting for lock
[ https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902972#comment-15902972 ] hustfxj edited comment on SPARK-19883 at 3/9/17 12:25 PM: -- [~srowen] A stage is still blocked because a task of this stage is still waiting like that. This stage start block from 11:20am. I don' think it is not a problem. And it is like the issue: https://issues.apache.org/jira/browse/SPARK-19300 was (Author: hustfxj): [~srowen] A stage is still blocked because a task of this stage is still waiting like that. This stage start block from 11:20am. And the executor's log like that: > Executor is waiting for lock > > > Key: SPARK-19883 > URL: https://issues.apache.org/jira/browse/SPARK-19883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: hustfxj > Attachments: jstack > > > {code:borderStyle=solid} > "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 > tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000498c249c0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > at org.apache.spark.scheduler.Task.run(Task.scala:114) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-19883) Executor is waiting for lock
[ https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj reopened SPARK-19883: - I this it is a question, like that https://issues.apache.org/jira/browse/SPARK-19300 > Executor is waiting for lock > > > Key: SPARK-19883 > URL: https://issues.apache.org/jira/browse/SPARK-19883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: hustfxj > Attachments: jstack > > > {code:borderStyle=solid} > "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 > tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000498c249c0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > at org.apache.spark.scheduler.Task.run(Task.scala:114) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19883) Executor is waiting for lock
[ https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902972#comment-15902972 ] hustfxj commented on SPARK-19883: - [~srowen] A stage is still blocked because a task of this stage is still waiting like that. This stage start block from 11:20am. And the executor's log like that: > Executor is waiting for lock > > > Key: SPARK-19883 > URL: https://issues.apache.org/jira/browse/SPARK-19883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: hustfxj > Attachments: jstack > > > {code:borderStyle=solid} > "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 > tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000498c249c0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > at org.apache.spark.scheduler.Task.run(Task.scala:114) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-19883) Executor is waiting for lock
[ https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19883: Comment: was deleted (was: And the netty is netty from 4.0.43.Final) > Executor is waiting for lock > > > Key: SPARK-19883 > URL: https://issues.apache.org/jira/browse/SPARK-19883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: hustfxj > Attachments: jstack > > > {code:borderStyle=solid} > "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 > tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000498c249c0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > at org.apache.spark.scheduler.Task.run(Task.scala:114) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19883) Executor is waiting for lock
[ https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902962#comment-15902962 ] hustfxj commented on SPARK-19883: - And the netty is netty from 4.0.43.Final > Executor is waiting for lock > > > Key: SPARK-19883 > URL: https://issues.apache.org/jira/browse/SPARK-19883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: hustfxj > Attachments: jstack > > > {code:borderStyle=solid} > "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 > tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000498c249c0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > at org.apache.spark.scheduler.Task.run(Task.scala:114) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19883) Executor is waiting for lock
[ https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19883: Attachment: jstack the full thread dump > Executor is waiting for lock > > > Key: SPARK-19883 > URL: https://issues.apache.org/jira/browse/SPARK-19883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: hustfxj > Attachments: jstack > > > {code:borderStyle=solid} > "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 > tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000498c249c0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > at org.apache.spark.scheduler.Task.run(Task.scala:114) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19883) Executor is waiting for lock
hustfxj created SPARK-19883: --- Summary: Executor is waiting for lock Key: SPARK-19883 URL: https://issues.apache.org/jira/browse/SPARK-19883 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Reporter: hustfxj {code:borderStyle=solid} "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000498c249c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) at org.apache.spark.scheduler.Task.run(Task.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) at java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19831: Description: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like *SendHeartbeat*; 2. It had better not receive the heartbeat master by *receive* method. Because any other rpc message may block the *receive* method. Then worker won't receive the heartbeat message. So it had better send the heartbeat master at an asynchronous timing thread . was: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like *SendHeartbeat*; 2. It had better receive the heartbeat master by *receive* method. Because any other rpc message may block the *receive* method. Then worker won't receive the heartbeat message. So it had better send the heartbeat master at an asynchronous timing thread . > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj >Priority: Minor > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master because the worker is extend > *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the > message *ApplicationFinished*, master will think the worker is dead. If the > worker has a driver, the driver will be scheduled by master again. So I think > it is the bug on spark. It may solve this problem by the followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like *SendHeartbeat*; > 2. It had better not receive the heartbeat master by *receive* method. > Because any other rpc message may block the *receive* method. Then worker > won't receive the heartbeat message. So it had better send the heartbeat > master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19831: Description: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like *SendHeartbeat*; 2. It had better not receive the heartbeat master by *receive* method. Because any other rpc message may block the *receive* method. Then worker won't receive the heartbeat message timely. So it had better send the heartbeat master at an asynchronous timing thread . was: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like *SendHeartbeat*; 2. It had better not receive the heartbeat master by *receive* method. Because any other rpc message may block the *receive* method. Then worker won't receive the heartbeat message. So it had better send the heartbeat master at an asynchronous timing thread . > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj >Priority: Minor > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master because the worker is extend > *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the > message *ApplicationFinished*, master will think the worker is dead. If the > worker has a driver, the driver will be scheduled by master again. So I think > it is the bug on spark. It may solve this problem by the followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like *SendHeartbeat*; > 2. It had better not receive the heartbeat master by *receive* method. > Because any other rpc message may block the *receive* method. Then worker > won't receive the heartbeat message timely. So it had better send the > heartbeat master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19831: Description: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like *SendHeartbeat*; 2. It had better receive the heartbeat master by *receive* method. Because any other rpc message may block the *receive* method. Then worker won't receive the heartbeat message. So it had better send the heartbeat master at an asynchronous timing thread . was: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like *SendHeartbeat*; 2. It had better not send the heartbeat master by *receive* method. Because any other rpc message may block the *receive* method. Then worker won't receive the heartbeat message. So it had better send the heartbeat master at an asynchronous timing thread . > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj >Priority: Minor > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master because the worker is extend > *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the > message *ApplicationFinished*, master will think the worker is dead. If the > worker has a driver, the driver will be scheduled by master again. So I think > it is the bug on spark. It may solve this problem by the followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like *SendHeartbeat*; > 2. It had better receive the heartbeat master by *receive* method. Because > any other rpc message may block the *receive* method. Then worker won't > receive the heartbeat message. So it had better send the heartbeat master at > an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19831: Description: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like *SendHeartbeat*; 2. It had better not send the heartbeat master by *receive* method. Because any other rpc message may block the *receive* method. Then worker won't receive the heartbeat message. So it had better send the heartbeat master at an asynchronous timing thread . was: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like *SendHeartbeat*; 2. It had better not send the heartbeat master by Rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread . > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj >Priority: Minor > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master because the worker is extend > *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the > message *ApplicationFinished*, master will think the worker is dead. If the > worker has a driver, the driver will be scheduled by master again. So I think > it is the bug on spark. It may solve this problem by the followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like *SendHeartbeat*; > 2. It had better not send the heartbeat master by *receive* method. Because > any other rpc message may block the *receive* method. Then worker won't > receive the heartbeat message. So it had better send the heartbeat master at > an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898704#comment-15898704 ] hustfxj edited comment on SPARK-19831 at 3/7/17 4:15 AM: - [~zsxwing]. I only find the code which handles *ApplicationFinished* message is slow . So I also think such codes should be run in a separate thread. I will submit a PR which make the codes in a separate thread. was (Author: hustfxj): [~zsxwing]. I only find the code which handles *ApplicationFinished* message is slow until now. So I also think such codes should be run in a separate thread. > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj >Priority: Minor > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master because the worker is extend > *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the > message *ApplicationFinished*, master will think the worker is dead. If the > worker has a driver, the driver will be scheduled by master again. So I think > it is the bug on spark. It may solve this problem by the followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like *SendHeartbeat*; > 2. It had better not send the heartbeat master by Rpc channel. Because any > other rpc message may block the rpc channel. It had better send the heartbeat > master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19829) The log about driver should support rolling like executor
[ https://issues.apache.org/jira/browse/SPARK-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898713#comment-15898713 ] hustfxj commented on SPARK-19829: - [~sowen] Yes, this is handled by systems like YARN. But my spark cluster is standalone. A standalone cluster should build its own rolling about driver' log in default other than defined log4j configuration. > The log about driver should support rolling like executor > - > > Key: SPARK-19829 > URL: https://issues.apache.org/jira/browse/SPARK-19829 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: hustfxj >Priority: Minor > > We should rollback the log of the driver , or the log maybe large!!! > {code:title=DriverRunner.java|borderStyle=solid} > // modify the runDriver > private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: > Boolean): Int = { > builder.directory(baseDir) > def initialize(process: Process): Unit = { > // Redirect stdout and stderr to files-- the old code > // val stdout = new File(baseDir, "stdout") > // CommandUtils.redirectStream(process.getInputStream, stdout) > // > // val stderr = new File(baseDir, "stderr") > // val formattedCommand = builder.command.asScala.mkString("\"", "\" > \"", "\"") > // val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, > "=" * 40) > // Files.append(header, stderr, StandardCharsets.UTF_8) > // CommandUtils.redirectStream(process.getErrorStream, stderr) > // Redirect its stdout and stderr to files-support rolling > val stdout = new File(baseDir, "stdout") > stdoutAppender = FileAppender(process.getInputStream, stdout, conf) > val stderr = new File(baseDir, "stderr") > val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", > "\"") > val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" > * 40) > Files.append(header, stderr, StandardCharsets.UTF_8) > stderrAppender = FileAppender(process.getErrorStream, stderr, conf) > } > runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898704#comment-15898704 ] hustfxj commented on SPARK-19831: - [~zsxwing]. I only find the code which handles *ApplicationFinished* message is slow until now. So I also think such codes should be run in a separate thread. > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj >Priority: Minor > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master because the worker is extend > *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the > message *ApplicationFinished*, master will think the worker is dead. If the > worker has a driver, the driver will be scheduled by master again. So I think > it is the bug on spark. It may solve this problem by the followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like *SendHeartbeat*; > 2. It had better not send the heartbeat master by Rpc channel. Because any > other rpc message may block the rpc channel. It had better send the heartbeat > master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19831: Description: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like *SendHeartbeat*; 2. It had better not send the heartbeat master by Rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread . was: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like SendHeartbeat; 2. It had better not send the heartbeat master by rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread . > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master because the worker is extend > *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the > message *ApplicationFinished*, master will think the worker is dead. If the > worker has a driver, the driver will be scheduled by master again. So I think > it is the bug on spark. It may solve this problem by the followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like *SendHeartbeat*; > 2. It had better not send the heartbeat master by Rpc channel. Because any > other rpc message may block the rpc channel. It had better send the heartbeat > master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19831: Description: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like SendHeartbeat; 2. It had better not send the heartbeat master by rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread . was: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. I can solve this problem by followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like SendHeartbeat; 2. It had better not send the heartbeat master by rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread . > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master because the worker is extend > *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the > message *ApplicationFinished*, master will think the worker is dead. If the > worker has a driver, the driver will be scheduled by master again. So I think > it is the bug on spark. It may solve this problem by the followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like SendHeartbeat; > 2. It had better not send the heartbeat master by rpc channel. Because any > other rpc message may block the rpc channel. It had better send the heartbeat > master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19831: Description: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. I can solve this problem by followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like SendHeartbeat; 2. It had better not send the heartbeat master by rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread . was: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master and rpc messages because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. I can solve this problem by followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like SendHeartbeat; 2. It had better not send the heartbeat master by rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread . > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master because the worker is extend > *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the > message *ApplicationFinished*, master will think the worker is dead. If the > worker has a driver, the driver will be scheduled by master again. So I think > it is the bug on spark. I can solve this problem by followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like SendHeartbeat; > 2. It had better not send the heartbeat master by rpc channel. Because any > other rpc message may block the rpc channel. It had better send the heartbeat > master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19831: Description: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master and rpc messages because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. I can solve this problem by followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like SendHeartbeat; 2. It had better not send the heartbeat master by rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread . was: Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master and rpc messages because the worker is extend *ThreadSafeRpcEndpoint*. So the master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. I can solve this problem by followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like SendHeartbeat; 2. It had better not send the heartbeat master by rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread . > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master and rpc messages because the worker > is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked > by the message *ApplicationFinished*, master will think the worker is dead. > If the worker has a driver, the driver will be scheduled by master again. So > I think it is the bug on spark. I can solve this problem by followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like SendHeartbeat; > 2. It had better not send the heartbeat master by rpc channel. Because any > other rpc message may block the rpc channel. It had better send the heartbeat > master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19831: Summary: Sending the heartbeat master from worker maybe blocked by other rpc messages (was: Sending the heartbeat to master maybe blocked by other rpc messages) > Sending the heartbeat master from worker maybe blocked by other rpc messages > -- > > Key: SPARK-19831 > URL: https://issues.apache.org/jira/browse/SPARK-19831 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: hustfxj > > Cleaning the application may cost much time at worker, then it will block > that the worker send heartbeats master and rpc messages because the worker > is extend *ThreadSafeRpcEndpoint*. So the master will think the worker is > dead. If the worker has a driver, the driver will be scheduled by master > again. So I think it is the bug on spark. I can solve this problem by > followed suggests: > 1. It had better put the cleaning the application in a single asynchronous > thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages > like SendHeartbeat; > 2. It had better not send the heartbeat master by rpc channel. Because any > other rpc message may block the rpc channel. It had better send the heartbeat > master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19831) Sending the heartbeat to master maybe blocked by other rpc messages
hustfxj created SPARK-19831: --- Summary: Sending the heartbeat to master maybe blocked by other rpc messages Key: SPARK-19831 URL: https://issues.apache.org/jira/browse/SPARK-19831 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: hustfxj Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master and rpc messages because the worker is extend *ThreadSafeRpcEndpoint*. So the master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. I can solve this problem by followed suggests: 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like SendHeartbeat; 2. It had better not send the heartbeat master by rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19829) The log about driver should support rolling like executor
[ https://issues.apache.org/jira/browse/SPARK-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19829: Description: We should rollback the log of the driver , or the log maybe large!!! {code:title=DriverRunner.java|borderStyle=solid} // modify the runDriver private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: Boolean): Int = { builder.directory(baseDir) def initialize(process: Process): Unit = { // Redirect stdout and stderr to files-- the old code // val stdout = new File(baseDir, "stdout") // CommandUtils.redirectStream(process.getInputStream, stdout) // // val stderr = new File(baseDir, "stderr") // val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", "\"") // val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" * 40) // Files.append(header, stderr, StandardCharsets.UTF_8) // CommandUtils.redirectStream(process.getErrorStream, stderr) // Redirect its stdout and stderr to files-support rolling val stdout = new File(baseDir, "stdout") stdoutAppender = FileAppender(process.getInputStream, stdout, conf) val stderr = new File(baseDir, "stderr") val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", "\"") val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" * 40) Files.append(header, stderr, StandardCharsets.UTF_8) stderrAppender = FileAppender(process.getErrorStream, stderr, conf) } runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise) } {code} was: We should rollback the log of the driver , or the log maybe large!!! {code:title=Bar.java|borderStyle=solid} // modify the runDriver private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: Boolean): Int = { builder.directory(baseDir) def initialize(process: Process): Unit = { // Redirect stdout and stderr to files-- the old code // val stdout = new File(baseDir, "stdout") // CommandUtils.redirectStream(process.getInputStream, stdout) // // val stderr = new File(baseDir, "stderr") // val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", "\"") // val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" * 40) // Files.append(header, stderr, StandardCharsets.UTF_8) // CommandUtils.redirectStream(process.getErrorStream, stderr) // Redirect its stdout and stderr to files-support rolling val stdout = new File(baseDir, "stdout") stdoutAppender = FileAppender(process.getInputStream, stdout, conf) val stderr = new File(baseDir, "stderr") val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", "\"") val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" * 40) Files.append(header, stderr, StandardCharsets.UTF_8) stderrAppender = FileAppender(process.getErrorStream, stderr, conf) } runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise) } {code} > The log about driver should support rolling like executor > - > > Key: SPARK-19829 > URL: https://issues.apache.org/jira/browse/SPARK-19829 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: hustfxj > > We should rollback the log of the driver , or the log maybe large!!! > {code:title=DriverRunner.java|borderStyle=solid} > // modify the runDriver > private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: > Boolean): Int = { > builder.directory(baseDir) > def initialize(process: Process): Unit = { > // Redirect stdout and stderr to files-- the old code > // val stdout = new File(baseDir, "stdout") > // CommandUtils.redirectStream(process.getInputStream, stdout) > // > // val stderr = new File(baseDir, "stderr") > // val formattedCommand = builder.command.asScala.mkString("\"", "\" > \"", "\"") > // val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, > "=" * 40) > // Files.append(header, stderr, StandardCharsets.UTF_8) > // CommandUtils.redirectStream(process.getErrorStream, stderr) > // Redirect its stdout and stderr to files-support rolling > val stdout = new File(baseDir, "stdout") > stdoutAppender = FileAppender(process.getInputStream, stdout, conf) > val stderr = new File(baseDir, "stderr") > val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", > "\"") > val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" > * 40) > Files.append(header, stderr, StandardCharsets.UTF_8) > stderrAppender =
[jira] [Created] (SPARK-19829) The log about driver should support rolling like executor
hustfxj created SPARK-19829: --- Summary: The log about driver should support rolling like executor Key: SPARK-19829 URL: https://issues.apache.org/jira/browse/SPARK-19829 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.0.0 Reporter: hustfxj We should rollback the log of the driver , or the log maybe large!!! {code:title=Bar.java|borderStyle=solid} // modify the runDriver private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: Boolean): Int = { builder.directory(baseDir) def initialize(process: Process): Unit = { // Redirect stdout and stderr to files-- the old code // val stdout = new File(baseDir, "stdout") // CommandUtils.redirectStream(process.getInputStream, stdout) // // val stderr = new File(baseDir, "stderr") // val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", "\"") // val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" * 40) // Files.append(header, stderr, StandardCharsets.UTF_8) // CommandUtils.redirectStream(process.getErrorStream, stderr) // Redirect its stdout and stderr to files-support rolling val stdout = new File(baseDir, "stdout") stdoutAppender = FileAppender(process.getInputStream, stdout, conf) val stderr = new File(baseDir, "stderr") val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", "\"") val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" * 40) Files.append(header, stderr, StandardCharsets.UTF_8) stderrAppender = FileAppender(process.getErrorStream, stderr, conf) } runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise) } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828402#comment-15828402 ] hustfxj commented on SPARK-19264: - @Sean Owen Why not solve it like AM of Yarn. I remember the applicationMaster monitors the user's program main thread, If the main thread quit, then the AM will finish. As follows: {code:title=ApplicationMaster.scala|borderStyle=solid} private def runDriver(securityMgr: SecurityManager): Unit = { addAmIpFilter() userClassThread = startUserApplication() // This a bit hacky, but we need to wait until the spark.driver.port property has // been set by the Thread executing the user class. logInfo("Waiting for spark context initialization...") val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME) try { val sc = ThreadUtils.awaitResult(sparkContextPromise.future, Duration(totalWaitTime, TimeUnit.MILLISECONDS)) if (sc != null) { rpcEnv = sc.env.rpcEnv val driverRef = runAMEndpoint( sc.getConf.get("spark.driver.host"), sc.getConf.get("spark.driver.port"), isClusterMode = true) registerAM(sc.getConf, rpcEnv, driverRef, sc.ui.map(_.webUrl).getOrElse(""), securityMgr) } else { // Sanity check; should never happen in normal operation, since sc should only be null // if the user app did not create a SparkContext. if (!finished) { throw new IllegalStateException("SparkContext is null but app is still running!") } } userClassThread.join() } catch { case e: SparkException if e.getCause().isInstanceOf[TimeoutException] => logError( s"SparkContext did not initialize after waiting for $totalWaitTime ms. " + "Please check earlier log output for errors. Failing the application.") finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_SC_NOT_INITED, "Timed out waiting for SparkContext.") } } {code} > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some non-daemon threads. Because the program terminates > when there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828366#comment-15828366 ] hustfxj commented on SPARK-19264: - Maybe you are right. We can't hard-kill the driver. But I don't think it is a good design. I still think the driver should terminate if the main thread quit. Maybe we should make user know the issue on the documentation if the user's program has spawned non-daemon threads. Thank you. > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some non-daemon threads. Because the program terminates > when there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827201#comment-15827201 ] hustfxj edited comment on SPARK-19264 at 1/18/17 1:39 AM: -- So I must change the last code like that {code} try { ssc.awaitTermination() } finally { System.exit(-1); } {code} I think the user shouldn't care this issue.Since the spark provide the awaitTermination(), thus we should make the driver quit the main thread done whether the driver still contains the unfinished non-daemon theads or not. was (Author: hustfxj): So I must change the last code like that {code} try { ssc.awaitTermination() } finally { System.exit(-1); } {code} > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some non-daemon threads. Because the program terminates > when there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827201#comment-15827201 ] hustfxj commented on SPARK-19264: - So I must change the last code like that {code} try { ssc.awaitTermination() } finally { System.exit(-1); } {code} > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some non-daemon threads. Because the program terminates > when there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827181#comment-15827181 ] hustfxj edited comment on SPARK-19264 at 1/18/17 1:18 AM: -- [~srowen] sorry, I didn't say it clearly. I means the driver can't be done when it contains other unfinished non-daemon threads. Look at the followed example. the driver program should crash due to the exception. But In fact the driver program can't crash because the timer threads still are runnning. {code:title=Test.scala|borderStyle=solid} val sparkConf = new SparkConf().setAppName("NetworkWordCount") sparkConf.set("spark.streaming.blockInterval", "1000ms") val ssc = new StreamingContext(sparkConf, Seconds(10)) //non-daemon thread val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor( new ThreadFactory() { def newThread(r: Runnable): Thread = new Thread(r, "Driver-Commit-Thread") }) scheduledExecutorService.scheduleAtFixedRate( new Runnable() { def run() { try { System.out.println("runable") } catch { case e: Exception => { System.out.println("ScheduledTask persistAllConsumerOffset exception", e) } } } }, 1000, 1000 * 5, TimeUnit.MILLISECONDS) val lines = ssc.receiverStream(new WordReceiver(StorageLevel.MEMORY_AND_DISK_2)) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 10) wordCounts.foreachRDD{rdd => rdd.collect().foreach(println) throw new RuntimeException } ssc.start() ssc.awaitTermination() {code} was (Author: hustfxj): [~srowen] sorry, I didn't say it clearly. I means the spark application can't be done when it contains other unfinished non-daemon. Look at the followed example. the driver program should crash due to the exception. But In fact the driver program can't crash because the timer threads still are runnning. {code:title=Test.scala|borderStyle=solid} val sparkConf = new SparkConf().setAppName("NetworkWordCount") sparkConf.set("spark.streaming.blockInterval", "1000ms") val ssc = new StreamingContext(sparkConf, Seconds(10)) //non-daemon thread val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor( new ThreadFactory() { def newThread(r: Runnable): Thread = new Thread(r, "Driver-Commit-Thread") }) scheduledExecutorService.scheduleAtFixedRate( new Runnable() { def run() { try { System.out.println("runable") } catch { case e: Exception => { System.out.println("ScheduledTask persistAllConsumerOffset exception", e) } } } }, 1000, 1000 * 5, TimeUnit.MILLISECONDS) val lines = ssc.receiverStream(new WordReceiver(StorageLevel.MEMORY_AND_DISK_2)) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 10) wordCounts.foreachRDD{rdd => rdd.collect().foreach(println) throw new RuntimeException } ssc.start() ssc.awaitTermination() {code} > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some non-daemon threads. Because the program terminates > when there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827181#comment-15827181 ] hustfxj edited comment on SPARK-19264 at 1/18/17 1:16 AM: -- [~srowen] sorry, I didn't say it clearly. I means the spark application can't be done when it contains other unfinished non-daemon. Look at the followed example. the driver program should crash due to the exception. But In fact the driver program can't crash because the timer threads still are runnning. {code:title=Test.scala|borderStyle=solid} val sparkConf = new SparkConf().setAppName("NetworkWordCount") sparkConf.set("spark.streaming.blockInterval", "1000ms") val ssc = new StreamingContext(sparkConf, Seconds(10)) //non-daemon thread val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor( new ThreadFactory() { def newThread(r: Runnable): Thread = new Thread(r, "Driver-Commit-Thread") }) scheduledExecutorService.scheduleAtFixedRate( new Runnable() { def run() { try { System.out.println("runable") } catch { case e: Exception => { System.out.println("ScheduledTask persistAllConsumerOffset exception", e) } } } }, 1000, 1000 * 5, TimeUnit.MILLISECONDS) val lines = ssc.receiverStream(new WordReceiver(StorageLevel.MEMORY_AND_DISK_2)) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 10) wordCounts.foreachRDD{rdd => rdd.collect().foreach(println) throw new RuntimeException } ssc.start() ssc.awaitTermination() {code} was (Author: hustfxj): [~srowen] sorry, I didn't say it clearly. I means the spark application can't be done when it contains other unfinished non-daemon. Look at the follows example. the driver program should crash due to the exception. But In fact the driver program can't crash because the timer threads still are runnning. {code:title=Test.scala|borderStyle=solid} val sparkConf = new SparkConf().setAppName("NetworkWordCount") sparkConf.set("spark.streaming.blockInterval", "1000ms") val ssc = new StreamingContext(sparkConf, Seconds(10)) //non-daemon thread val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor( new ThreadFactory() { def newThread(r: Runnable): Thread = new Thread(r, "Driver-Commit-Thread") }) scheduledExecutorService.scheduleAtFixedRate( new Runnable() { def run() { try { System.out.println("runable") } catch { case e: Exception => { System.out.println("ScheduledTask persistAllConsumerOffset exception", e) } } } }, 1000, 1000 * 5, TimeUnit.MILLISECONDS) val lines = ssc.receiverStream(new WordReceiver(StorageLevel.MEMORY_AND_DISK_2)) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 10) wordCounts.foreachRDD{rdd => rdd.collect().foreach(println) throw new RuntimeException } ssc.start() ssc.awaitTermination() {code} > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some non-daemon threads. Because the program terminates > when there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827181#comment-15827181 ] hustfxj edited comment on SPARK-19264 at 1/18/17 1:16 AM: -- [~srowen] sorry, I didn't say it clearly. I means the spark application can't be done when it contains other unfinished non-daemon. Look at the follows example. the driver program should crash due to the exception. But In fact the driver program can't crash because the timer threads still are runnning. {code:title=Test.scala|borderStyle=solid} val sparkConf = new SparkConf().setAppName("NetworkWordCount") sparkConf.set("spark.streaming.blockInterval", "1000ms") val ssc = new StreamingContext(sparkConf, Seconds(10)) //non-daemon thread val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor( new ThreadFactory() { def newThread(r: Runnable): Thread = new Thread(r, "Driver-Commit-Thread") }) scheduledExecutorService.scheduleAtFixedRate( new Runnable() { def run() { try { System.out.println("runable") } catch { case e: Exception => { System.out.println("ScheduledTask persistAllConsumerOffset exception", e) } } } }, 1000, 1000 * 5, TimeUnit.MILLISECONDS) val lines = ssc.receiverStream(new WordReceiver(StorageLevel.MEMORY_AND_DISK_2)) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 10) wordCounts.foreachRDD{rdd => rdd.collect().foreach(println) throw new RuntimeException } ssc.start() ssc.awaitTermination() {code} was (Author: hustfxj): [~srowen] sorry, I didn't say it clearly. I means an spark application can't be done when it contains other unfinished non-daemon. for example {code:title=Test.scala|borderStyle=solid} val sparkConf = new SparkConf().setAppName("NetworkWordCount") sparkConf.set("spark.streaming.blockInterval", "1000ms") val ssc = new StreamingContext(sparkConf, Seconds(10)) //non-daemon thread val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor( new ThreadFactory() { def newThread(r: Runnable): Thread = new Thread(r, "Driver-Commit-Thread") }) scheduledExecutorService.scheduleAtFixedRate( new Runnable() { def run() { try { System.out.println("runable") } catch { case e: Exception => { System.out.println("ScheduledTask persistAllConsumerOffset exception", e) } } } }, 1000, 1000 * 5, TimeUnit.MILLISECONDS) val lines = ssc.receiverStream(new WordReceiver(StorageLevel.MEMORY_AND_DISK_2)) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 10) wordCounts.foreachRDD{rdd => rdd.collect().foreach(println) throw new RuntimeException } ssc.start() ssc.awaitTermination() {code} > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some non-daemon threads. Because the program terminates > when there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827181#comment-15827181 ] hustfxj edited comment on SPARK-19264 at 1/18/17 1:12 AM: -- [~srowen] sorry, I didn't say it clearly. I means an spark application can't be done when it contains other unfinished non-daemon. for example {code:title=Test.scala|borderStyle=solid} val sparkConf = new SparkConf().setAppName("NetworkWordCount") sparkConf.set("spark.streaming.blockInterval", "1000ms") val ssc = new StreamingContext(sparkConf, Seconds(10)) //non-daemon thread val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor( new ThreadFactory() { def newThread(r: Runnable): Thread = new Thread(r, "Driver-Commit-Thread") }) scheduledExecutorService.scheduleAtFixedRate( new Runnable() { def run() { try { System.out.println("runable") } catch { case e: Exception => { System.out.println("ScheduledTask persistAllConsumerOffset exception", e) } } } }, 1000, 1000 * 5, TimeUnit.MILLISECONDS) val lines = ssc.receiverStream(new WordReceiver(StorageLevel.MEMORY_AND_DISK_2)) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 10) wordCounts.foreachRDD{rdd => rdd.collect().foreach(println) throw new RuntimeException } ssc.start() ssc.awaitTermination() {code} was (Author: hustfxj): [~srowen] sorry, I didn't say it clearly. I means an spark application can't be done when it contain other unfinished non-daemon. for example {code:title=Test.scala|borderStyle=solid} val sparkConf = new SparkConf().setAppName("NetworkWordCount") sparkConf.set("spark.streaming.blockInterval", "1000ms") val ssc = new StreamingContext(sparkConf, Seconds(10)) //non-daemon thread val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor( new ThreadFactory() { def newThread(r: Runnable): Thread = new Thread(r, "Driver-Commit-Thread") }) scheduledExecutorService.scheduleAtFixedRate( new Runnable() { def run() { try { System.out.println("runable") } catch { case e: Exception => { System.out.println("ScheduledTask persistAllConsumerOffset exception", e) } } } }, 1000, 1000 * 5, TimeUnit.MILLISECONDS) val lines = ssc.receiverStream(new WordReceiver(StorageLevel.MEMORY_AND_DISK_2)) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 10) wordCounts.foreachRDD{rdd => rdd.collect().foreach(println) throw new RuntimeException } ssc.start() ssc.awaitTermination() {code} > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some non-daemon threads. Because the program terminates > when there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827181#comment-15827181 ] hustfxj commented on SPARK-19264: - [~srowen] sorry, I didn't say it clearly. I means an spark application can't be done when it contain other unfinished non-daemon. for example {code:title=Test.scala|borderStyle=solid} val sparkConf = new SparkConf().setAppName("NetworkWordCount") sparkConf.set("spark.streaming.blockInterval", "1000ms") val ssc = new StreamingContext(sparkConf, Seconds(10)) //non-daemon thread val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor( new ThreadFactory() { def newThread(r: Runnable): Thread = new Thread(r, "Driver-Commit-Thread") }) scheduledExecutorService.scheduleAtFixedRate( new Runnable() { def run() { try { System.out.println("runable") } catch { case e: Exception => { System.out.println("ScheduledTask persistAllConsumerOffset exception", e) } } } }, 1000, 1000 * 5, TimeUnit.MILLISECONDS) val lines = ssc.receiverStream(new WordReceiver(StorageLevel.MEMORY_AND_DISK_2)) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + y, 10) wordCounts.foreachRDD{rdd => rdd.collect().foreach(println) throw new RuntimeException } ssc.start() ssc.awaitTermination() {code} > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some non-daemon threads. Because the program terminates > when there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19264: Description: I think work can't start driver by "ProcessBuilderLike", thus we can't know the application's main thread is finished or not if the application's main thread contains some non-daemon threads. Because the program terminates when there no longer is any non-daemon thread running (or someone called System.exit). The main thread can have finished long ago. worker should start driver like AM of YARN . As followed: {code:title=ApplicationMaster.scala|borderStyle=solid} mainMethod.invoke(null, userArgs.toArray) finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) logDebug("Done running users class") {code} Then the work can monitor the driver's main thread, and know the application's state. was: I think work can't start driver by "ProcessBuilderLike", thus we can't know the application's main thread is finished or not if the application's main thread contains some daemon threads. Because the program terminates when there no longer is any non-daemon thread running (or someone called System.exit). The main thread can have finished long ago. worker should start driver like AM of YARN . As followed: {code:title=ApplicationMaster.scala|borderStyle=solid} mainMethod.invoke(null, userArgs.toArray) finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) logDebug("Done running users class") {code} Then the work can monitor the driver's main thread, and know the application's state. > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some non-daemon threads. Because the program terminates > when there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19264: Description: I think work can't start driver by "ProcessBuilderLike", thus we can't know the application's main thread is finished or not if the application's main thread contains some daemon threads. Because the program terminates when there no longer is any non-daemon thread running (or someone called System.exit). The main thread can have finished long ago. worker should start driver like AM of YARN . As followed: {code:title=ApplicationMaster.scala|borderStyle=solid} mainMethod.invoke(null, userArgs.toArray) finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) logDebug("Done running users class") {code} Then the work can monitor the driver's main thread, and know the application's state. was: I think work can't start driver by "ProcessBuilderLike", thus we can't know the application's main thread is finished or not if the application's main thread contains some daemon threads. Because the program terminates when there no longer is any non-daemon thread running (or someone called System.exit). The main thread can have finished long ago. worker should start driver like AM of YARN . As followed: {code:title=Bar.java|borderStyle=solid} mainMethod.invoke(null, userArgs.toArray) finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) logDebug("Done running users class") {code} Then the work can monitor the driver's main thread, and know the application's state. > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some daemon threads. Because the program terminates when > there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=ApplicationMaster.scala|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19264: Description: I think work can't start driver by "ProcessBuilderLike", thus we can't know the application's main thread is finished or not if the application's main thread contains some daemon threads. Because the program terminates when there no longer is any non-daemon thread running (or someone called System.exit). The main thread can have finished long ago. worker should start driver like AM of YARN . As followed: {code:title=Bar.java|borderStyle=solid} mainMethod.invoke(null, userArgs.toArray) finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) logDebug("Done running users class") {code} Then the work can monitor the driver's main thread, and know the application's state. was: I think work can't start driver by "ProcessBuilderLike", thus we can't know the application's main thread is finished or not if the application's main thread contains some daemon threads. Because the program terminates when there no longer is any non-daemon thread running (or someone called System.exit). The main thread can have finished long ago. worker should start driver like AM of YARN . As followed: ```mainMethod.invoke(null, userArgs.toArray) finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) logDebug("Done running users class")``` Then the work can monitor the driver's main thread, and know the application's state. > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some daemon threads. Because the program terminates when > there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > {code:title=Bar.java|borderStyle=solid} > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > {code} > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19264: Description: I think work can't start driver by "ProcessBuilderLike", thus we can't know the application's main thread is finished or not if the application's main thread contains some daemon threads. Because the program terminates when there no longer is any non-daemon thread running (or someone called System.exit). The main thread can have finished long ago. worker should start driver like AM of YARN . As followed: ``` mainMethod.invoke(null, userArgs.toArray) finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) logDebug("Done running users class") ``` Then the work can monitor the driver's main thread, and know the application's state. was: I think work can't start driver by "ProcessBuilderLike", thus we can't know the application's main thread is finished or not if the application's main thread contains some daemon threads. Because the program terminates when there no longer is any non-daemon thread running (or someone called System.exit). The main thread can have finished long ago. worker should start driver like AM of YARN . As followed: ``` mainMethod.invoke(null, userArgs.toArray) finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) logDebug("Done running users class") ``` Then the work can monitor the driver's main thread, and know the application's state. > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some daemon threads. Because the program terminates when > there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > > ``` > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > ``` > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19264) Work should start driver, the same to AM of yarn
[ https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19264: Summary: Work should start driver, the same to AM of yarn (was: Work maybe should start driver, the same to AM of yarn ) > Work should start driver, the same to AM of yarn > --- > > Key: SPARK-19264 > URL: https://issues.apache.org/jira/browse/SPARK-19264 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > I think work can't start driver by "ProcessBuilderLike", thus we can't > know the application's main thread is finished or not if the application's > main thread contains some daemon threads. Because the program terminates when > there no longer is any non-daemon thread running (or someone called > System.exit). The main thread can have finished long ago. > worker should start driver like AM of YARN . As followed: > > ``` > mainMethod.invoke(null, userArgs.toArray) > finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) > logDebug("Done running users class") > ``` > Then the work can monitor the driver's main thread, and know the > application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19264) Work maybe should start driver, the same to AM of yarn
hustfxj created SPARK-19264: --- Summary: Work maybe should start driver, the same to AM of yarn Key: SPARK-19264 URL: https://issues.apache.org/jira/browse/SPARK-19264 Project: Spark Issue Type: Improvement Reporter: hustfxj I think work can't start driver by "ProcessBuilderLike", thus we can't know the application's main thread is finished or not if the application's main thread contains some daemon threads. Because the program terminates when there no longer is any non-daemon thread running (or someone called System.exit). The main thread can have finished long ago. worker should start driver like AM of YARN . As followed: ``` mainMethod.invoke(null, userArgs.toArray) finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) logDebug("Done running users class") ``` Then the work can monitor the driver's main thread, and know the application's state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19052) the rest api don't support multiple standby masters on standalone cluster
hustfxj created SPARK-19052: --- Summary: the rest api don't support multiple standby masters on standalone cluster Key: SPARK-19052 URL: https://issues.apache.org/jira/browse/SPARK-19052 Project: Spark Issue Type: Bug Reporter: hustfxj The driver only know a master's address which come from StandaloneRestServer's masterUrl If submitting the job by rest api.So we should give priority to set "spark.master" by "sparkProperties". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18959) invalid resource statistics for standalone cluster
[ https://issues.apache.org/jira/browse/SPARK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15791056#comment-15791056 ] hustfxj commented on SPARK-18959: - Yes, I have a fix. You can see the link https://github.com/apache/spark/pull/16446 > invalid resource statistics for standalone cluster > -- > > Key: SPARK-18959 > URL: https://issues.apache.org/jira/browse/SPARK-18959 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: hustfxj >Priority: Minor > Attachments: 屏幕快照 2016-12-21 11.49.12.png > > > Workers > Worker Id Address State Cores Memory > worker-20161220162751-10.125.6.222-59295 10.125.6.222:59295 ALIVE > 4 (-1 Used) 6.8 GB (-1073741824.0 B Used) > worker-20161220164233-10.218.135.80-10944 10.218.135.80:10944 ALIVE > 4 (0 Used) 6.8 GB (0.0 B Used) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19042) Remove query string from jar url for executor
[ https://issues.apache.org/jira/browse/SPARK-19042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15789488#comment-15789488 ] hustfxj commented on SPARK-19042: - https://github.com/apache/spark/pull/16443 > Remove query string from jar url for executor > - > > Key: SPARK-19042 > URL: https://issues.apache.org/jira/browse/SPARK-19042 > Project: Spark > Issue Type: Bug >Reporter: hustfxj > > spark.jars support jar url with http protocal. However, if the url contains > any query strings, the "localName = name.split("/").last" won't get the > expected jar, then "val url = new File(SparkFiles.getRootDirectory(), > localName).toURI.toURL" will get invalid url. The bug fix is the same as > [SPARK-17855] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19042) Remove query string from jar url for executor
hustfxj created SPARK-19042: --- Summary: Remove query string from jar url for executor Key: SPARK-19042 URL: https://issues.apache.org/jira/browse/SPARK-19042 Project: Spark Issue Type: Bug Reporter: hustfxj spark.jars support jar url with http protocal. However, if the url contains any query strings, the "localName = name.split("/").last" won't get the expected jar, then "val url = new File(SparkFiles.getRootDirectory(), localName).toURI.toURL" will get invalid url. The bug fix is the same as [SPARK-17855] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster
[ https://issues.apache.org/jira/browse/SPARK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19011: Description: A large standalone cluster may have logs of applications and drivers, So I may not know which driver start up which application from the masterPage . So I think we can add the driver id at the applicationDescription. In fact, I have finished! > ApplicationDescription should add the Submission ID for the standalone cluster > -- > > Key: SPARK-19011 > URL: https://issues.apache.org/jira/browse/SPARK-19011 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > A large standalone cluster may have logs of applications and drivers, So I > may not know which driver start up which application from the masterPage . So > I think we can add the driver id at the applicationDescription. In fact, I > have finished! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster
[ https://issues.apache.org/jira/browse/SPARK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780310#comment-15780310 ] hustfxj commented on SPARK-19011: - A large standalone cluster may have logs of applications and drivers, So I may not know which driver start up which application from the masterPage . So I think we can add the driver id at the applicationDescription. In fact, I have finished! > ApplicationDescription should add the Submission ID for the standalone cluster > -- > > Key: SPARK-19011 > URL: https://issues.apache.org/jira/browse/SPARK-19011 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster
hustfxj created SPARK-19011: --- Summary: ApplicationDescription should add the Submission ID for the standalone cluster Key: SPARK-19011 URL: https://issues.apache.org/jira/browse/SPARK-19011 Project: Spark Issue Type: Improvement Reporter: hustfxj -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18959) invalid resource statistics for standalone cluster
[ https://issues.apache.org/jira/browse/SPARK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768696#comment-15768696 ] hustfxj commented on SPARK-18959: - 2.2.0-SNAPSHOT > invalid resource statistics for standalone cluster > -- > > Key: SPARK-18959 > URL: https://issues.apache.org/jira/browse/SPARK-18959 > Project: Spark > Issue Type: Bug >Reporter: hustfxj > Attachments: 屏幕快照 2016-12-21 11.49.12.png > > > Workers > Worker Id Address State Cores Memory > worker-20161220162751-10.125.6.222-59295 10.125.6.222:59295 ALIVE > 4 (-1 Used) 6.8 GB (-1073741824.0 B Used) > worker-20161220164233-10.218.135.80-10944 10.218.135.80:10944 ALIVE > 4 (0 Used) 6.8 GB (0.0 B Used) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18959) invalid resource statistics for standalone cluster
[ https://issues.apache.org/jira/browse/SPARK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-18959: Attachment: 屏幕快照 2016-12-21 11.49.12.png The attachment is the master page > invalid resource statistics for standalone cluster > -- > > Key: SPARK-18959 > URL: https://issues.apache.org/jira/browse/SPARK-18959 > Project: Spark > Issue Type: Bug >Reporter: hustfxj > Attachments: 屏幕快照 2016-12-21 11.49.12.png > > > Workers > Worker Id Address State Cores Memory > worker-20161220162751-10.125.6.222-59295 10.125.6.222:59295 ALIVE > 4 (-1 Used) 6.8 GB (-1073741824.0 B Used) > worker-20161220164233-10.218.135.80-10944 10.218.135.80:10944 ALIVE > 4 (0 Used) 6.8 GB (0.0 B Used) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18959) invalid resource statistics for standalone cluster
[ https://issues.apache.org/jira/browse/SPARK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766020#comment-15766020 ] hustfxj commented on SPARK-18959: - the attachment is the master page > invalid resource statistics for standalone cluster > -- > > Key: SPARK-18959 > URL: https://issues.apache.org/jira/browse/SPARK-18959 > Project: Spark > Issue Type: Bug >Reporter: hustfxj > > Workers > Worker Id Address State Cores Memory > worker-20161220162751-10.125.6.222-59295 10.125.6.222:59295 ALIVE > 4 (-1 Used) 6.8 GB (-1073741824.0 B Used) > worker-20161220164233-10.218.135.80-10944 10.218.135.80:10944 ALIVE > 4 (0 Used) 6.8 GB (0.0 B Used) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18959) invalid resource statistics for standalone cluster
hustfxj created SPARK-18959: --- Summary: invalid resource statistics for standalone cluster Key: SPARK-18959 URL: https://issues.apache.org/jira/browse/SPARK-18959 Project: Spark Issue Type: Bug Reporter: hustfxj Workers Worker Id Address State Cores Memory worker-20161220162751-10.125.6.222-5929510.125.6.222:59295 ALIVE 4 (-1 Used) 6.8 GB (-1073741824.0 B Used) worker-20161220164233-10.218.135.80-10944 10.218.135.80:10944 ALIVE 4 (0 Used) 6.8 GB (0.0 B Used) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-18706) Can spark support exactly once based kafka ? Due to these following question.
[ https://issues.apache.org/jira/browse/SPARK-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj closed SPARK-18706. --- Resolution: Duplicate > Can spark support exactly once based kafka ? Due to these following question. > -- > > Key: SPARK-18706 > URL: https://issues.apache.org/jira/browse/SPARK-18706 > Project: Spark > Issue Type: Question >Reporter: hustfxj > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18706) Can spark support exactly once based kafka ? Due to these following question.
[ https://issues.apache.org/jira/browse/SPARK-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720087#comment-15720087 ] hustfxj commented on SPARK-18706: - 1. If a task complete the operation, it will notify driver. The driver may not receive the message due to the network, and think the task is still running. Then the child stage won't be scheduled ? 2. how do spark guarantee the downstream-task can receive the shuffle-data completely. As fact, I can't find the checksum for blocks in spark. For example, the upstream-task may shuffle 100Mb data, but the downstream-task may receive 99Mb data due to network. Can spark verify the data is received completely based size ? > Can spark support exactly once based kafka ? Due to these following question. > -- > > Key: SPARK-18706 > URL: https://issues.apache.org/jira/browse/SPARK-18706 > Project: Spark > Issue Type: Question >Reporter: hustfxj > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18706) Can spark support exactly once based kafka ? Due to these following question.
hustfxj created SPARK-18706: --- Summary: Can spark support exactly once based kafka ? Due to these following question. Key: SPARK-18706 URL: https://issues.apache.org/jira/browse/SPARK-18706 Project: Spark Issue Type: Question Reporter: hustfxj -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18707) Can spark support exactly once based kafka ? Due to these following question?
hustfxj created SPARK-18707: --- Summary: Can spark support exactly once based kafka ? Due to these following question? Key: SPARK-18707 URL: https://issues.apache.org/jira/browse/SPARK-18707 Project: Spark Issue Type: Question Reporter: hustfxj 1. If a task complete the operation, it will notify driver. The driver may not receive the message due to the network, and think the task is still running. Then the child stage won't be scheduled ? 2. how do spark guarantee the downstream-task can receive the shuffle-data completely. As fact, I can't find the checksum for blocks in spark. For example, the upstream-task may shuffle 100Mb data, but the downstream-task may receive 99Mb data due to network. Can spark verify the data is received completely based size ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18573) when restart yarn cluster, the spark streaming job will lost ?
[ https://issues.apache.org/jira/browse/SPARK-18573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691986#comment-15691986 ] hustfxj commented on SPARK-18573: - when I restart yarn cluster, the spark streaming jobs will lost . So I will resubmit all streaming jobs. how can we assure the streaming jobs won't lost while I restart the yarn cluster ? > when restart yarn cluster, the spark streaming job will lost ? > --- > > Key: SPARK-18573 > URL: https://issues.apache.org/jira/browse/SPARK-18573 > Project: Spark > Issue Type: Question >Reporter: hustfxj > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18573) when restart yarn cluster, the spark streaming job will lost ?
hustfxj created SPARK-18573: --- Summary: when restart yarn cluster, the spark streaming job will lost ? Key: SPARK-18573 URL: https://issues.apache.org/jira/browse/SPARK-18573 Project: Spark Issue Type: Question Reporter: hustfxj -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org