[jira] [Issue Comment Deleted] (SPARK-19300) Executor is waiting for lock

2017-03-09 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19300:

Comment: was deleted

(was: [~zsxwing] I also met this issue, and you can look  
https://issues.apache.org/jira/browse/SPARK-19883. And my netty's version  
already is  4.0.43.Final. )

> Executor is waiting for lock
> 
>
> Key: SPARK-19300
> URL: https://issues.apache.org/jira/browse/SPARK-19300
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: cen yuhai
>Priority: Critical
> Attachments: stderr.jpg, WAITING.jpg
>
>
> I can see all threads in the executor is waiting for lock.
> {code}
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:313)
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:396)
> org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:138)
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
> org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:342)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:293)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> org.apache.spark.scheduler.Task.run(Task.scala:99)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:330)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SPARK-19300) Executor is waiting for lock

2017-03-09 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902988#comment-15902988
 ] 

hustfxj commented on SPARK-19300:
-

[~zsxwing] I also met this issue, and you can look  
https://issues.apache.org/jira/browse/SPARK-19883. And my netty's version  
already is  4.0.43.Final. 

> Executor is waiting for lock
> 
>
> Key: SPARK-19300
> URL: https://issues.apache.org/jira/browse/SPARK-19300
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: cen yuhai
>Priority: Critical
> Attachments: stderr.jpg, WAITING.jpg
>
>
> I can see all threads in the executor is waiting for lock.
> {code}
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:313)
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$2.hasNext(WholeStageCodegenExec.scala:396)
> org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:138)
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
> org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:342)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:293)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> org.apache.spark.scheduler.Task.run(Task.scala:99)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:330)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SPARK-19829) The log about driver should support rolling like executor

2017-03-09 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj closed SPARK-19829.
---
Resolution: Invalid

> The log about driver should support rolling like executor
> -
>
> Key: SPARK-19829
> URL: https://issues.apache.org/jira/browse/SPARK-19829
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: hustfxj
>Priority: Minor
>
> We should rollback the log of the driver , or the log maybe large!!! 
> {code:title=DriverRunner.java|borderStyle=solid}
> // modify the runDriver
>   private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: 
> Boolean): Int = {
> builder.directory(baseDir)
> def initialize(process: Process): Unit = {
>   // Redirect stdout and stderr to files-- the old code
> //  val stdout = new File(baseDir, "stdout")
> //  CommandUtils.redirectStream(process.getInputStream, stdout)
> //
> //  val stderr = new File(baseDir, "stderr")
> //  val formattedCommand = builder.command.asScala.mkString("\"", "\" 
> \"", "\"")
> //  val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, 
> "=" * 40)
> //  Files.append(header, stderr, StandardCharsets.UTF_8)
> //  CommandUtils.redirectStream(process.getErrorStream, stderr)
>   // Redirect its stdout and stderr to files-support rolling
>   val stdout = new File(baseDir, "stdout")
>   stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
>   val stderr = new File(baseDir, "stderr")
>   val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", 
> "\"")
>   val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" 
> * 40)
>   Files.append(header, stderr, StandardCharsets.UTF_8)
>   stderrAppender = FileAppender(process.getErrorStream, stderr, conf)
> }
> runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19829) The log about driver should support rolling like executor

2017-03-09 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902982#comment-15902982
 ] 

hustfxj commented on SPARK-19829:
-

ok.

> The log about driver should support rolling like executor
> -
>
> Key: SPARK-19829
> URL: https://issues.apache.org/jira/browse/SPARK-19829
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: hustfxj
>Priority: Minor
>
> We should rollback the log of the driver , or the log maybe large!!! 
> {code:title=DriverRunner.java|borderStyle=solid}
> // modify the runDriver
>   private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: 
> Boolean): Int = {
> builder.directory(baseDir)
> def initialize(process: Process): Unit = {
>   // Redirect stdout and stderr to files-- the old code
> //  val stdout = new File(baseDir, "stdout")
> //  CommandUtils.redirectStream(process.getInputStream, stdout)
> //
> //  val stderr = new File(baseDir, "stderr")
> //  val formattedCommand = builder.command.asScala.mkString("\"", "\" 
> \"", "\"")
> //  val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, 
> "=" * 40)
> //  Files.append(header, stderr, StandardCharsets.UTF_8)
> //  CommandUtils.redirectStream(process.getErrorStream, stderr)
>   // Redirect its stdout and stderr to files-support rolling
>   val stdout = new File(baseDir, "stdout")
>   stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
>   val stderr = new File(baseDir, "stderr")
>   val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", 
> "\"")
>   val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" 
> * 40)
>   Files.append(header, stderr, StandardCharsets.UTF_8)
>   stderrAppender = FileAppender(process.getErrorStream, stderr, conf)
> }
> runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19883) Executor is waiting for lock

2017-03-09 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902980#comment-15902980
 ] 

hustfxj commented on SPARK-19883:
-

[~srowen] Saw this issue again today. So spark may not solve this issue.

> Executor is waiting for lock
> 
>
> Key: SPARK-19883
> URL: https://issues.apache.org/jira/browse/SPARK-19883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: hustfxj
> Attachments: jstack
>
>
> {code:borderStyle=solid}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19883) Executor is waiting for lock

2017-03-09 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902972#comment-15902972
 ] 

hustfxj edited comment on SPARK-19883 at 3/9/17 12:25 PM:
--

[~srowen] A stage is still blocked because a task of this stage is still 
waiting like that. This stage start block from 11:20am. I don' think it is not 
a problem. And it is like the issue:
https://issues.apache.org/jira/browse/SPARK-19300






was (Author: hustfxj):
[~srowen] A stage is still blocked because a task of this stage is still 
waiting like that. This stage start block from 11:20am.
And the executor's log like that:




> Executor is waiting for lock
> 
>
> Key: SPARK-19883
> URL: https://issues.apache.org/jira/browse/SPARK-19883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: hustfxj
> Attachments: jstack
>
>
> {code:borderStyle=solid}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-19883) Executor is waiting for lock

2017-03-09 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj reopened SPARK-19883:
-

I this it is a question, like that 
https://issues.apache.org/jira/browse/SPARK-19300

> Executor is waiting for lock
> 
>
> Key: SPARK-19883
> URL: https://issues.apache.org/jira/browse/SPARK-19883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: hustfxj
> Attachments: jstack
>
>
> {code:borderStyle=solid}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19883) Executor is waiting for lock

2017-03-09 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902972#comment-15902972
 ] 

hustfxj commented on SPARK-19883:
-

[~srowen] A stage is still blocked because a task of this stage is still 
waiting like that. This stage start block from 11:20am.
And the executor's log like that:




> Executor is waiting for lock
> 
>
> Key: SPARK-19883
> URL: https://issues.apache.org/jira/browse/SPARK-19883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: hustfxj
> Attachments: jstack
>
>
> {code:borderStyle=solid}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-19883) Executor is waiting for lock

2017-03-09 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19883:

Comment: was deleted

(was: And the netty is netty  from 4.0.43.Final)

> Executor is waiting for lock
> 
>
> Key: SPARK-19883
> URL: https://issues.apache.org/jira/browse/SPARK-19883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: hustfxj
> Attachments: jstack
>
>
> {code:borderStyle=solid}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19883) Executor is waiting for lock

2017-03-09 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902962#comment-15902962
 ] 

hustfxj commented on SPARK-19883:
-

And the netty is netty  from 4.0.43.Final

> Executor is waiting for lock
> 
>
> Key: SPARK-19883
> URL: https://issues.apache.org/jira/browse/SPARK-19883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: hustfxj
> Attachments: jstack
>
>
> {code:borderStyle=solid}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19883) Executor is waiting for lock

2017-03-09 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19883:

Attachment: jstack

the full thread dump

> Executor is waiting for lock
> 
>
> Key: SPARK-19883
> URL: https://issues.apache.org/jira/browse/SPARK-19883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: hustfxj
> Attachments: jstack
>
>
> {code:borderStyle=solid}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19883) Executor is waiting for lock

2017-03-09 Thread hustfxj (JIRA)
hustfxj created SPARK-19883:
---

 Summary: Executor is waiting for lock
 Key: SPARK-19883
 URL: https://issues.apache.org/jira/browse/SPARK-19883
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
Reporter: hustfxj


{code:borderStyle=solid}
"Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x000498c249c0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:834)




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-06 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19831:

Description: 
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like *SendHeartbeat*;

2. It had better not receive the heartbeat master by *receive* method.  Because 
any other rpc message may block the *receive* method. Then worker won't receive 
the heartbeat message. So it had better send the heartbeat master at an 
asynchronous timing thread .

  was:
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like *SendHeartbeat*;

2. It had better receive the heartbeat master by *receive* method.  Because any 
other rpc message may block the *receive* method. Then worker won't receive the 
heartbeat message. So it had better send the heartbeat master at an 
asynchronous timing thread .


> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>Priority: Minor
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master because the worker is extend 
> *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
> message *ApplicationFinished*,  master will think the worker is dead. If the 
> worker has a driver, the driver will be scheduled by master again. So I think 
> it is the bug on spark. It may solve this problem by the followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like *SendHeartbeat*;
> 2. It had better not receive the heartbeat master by *receive* method.  
> Because any other rpc message may block the *receive* method. Then worker 
> won't receive the heartbeat message. So it had better send the heartbeat 
> master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-06 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19831:

Description: 
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like *SendHeartbeat*;

2. It had better not receive the heartbeat master by *receive* method.  Because 
any other rpc message may block the *receive* method. Then worker won't receive 
the heartbeat message timely. So it had better send the heartbeat master at an 
asynchronous timing thread .

  was:
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like *SendHeartbeat*;

2. It had better not receive the heartbeat master by *receive* method.  Because 
any other rpc message may block the *receive* method. Then worker won't receive 
the heartbeat message. So it had better send the heartbeat master at an 
asynchronous timing thread .


> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>Priority: Minor
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master because the worker is extend 
> *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
> message *ApplicationFinished*,  master will think the worker is dead. If the 
> worker has a driver, the driver will be scheduled by master again. So I think 
> it is the bug on spark. It may solve this problem by the followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like *SendHeartbeat*;
> 2. It had better not receive the heartbeat master by *receive* method.  
> Because any other rpc message may block the *receive* method. Then worker 
> won't receive the heartbeat message timely. So it had better send the 
> heartbeat master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-06 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19831:

Description: 
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like *SendHeartbeat*;

2. It had better receive the heartbeat master by *receive* method.  Because any 
other rpc message may block the *receive* method. Then worker won't receive the 
heartbeat message. So it had better send the heartbeat master at an 
asynchronous timing thread .

  was:
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like *SendHeartbeat*;

2. It had better not send the heartbeat master by *receive* method.  Because 
any other rpc message may block the *receive* method. Then worker won't receive 
the heartbeat message. So it had better send the heartbeat master at an 
asynchronous timing thread .


> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>Priority: Minor
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master because the worker is extend 
> *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
> message *ApplicationFinished*,  master will think the worker is dead. If the 
> worker has a driver, the driver will be scheduled by master again. So I think 
> it is the bug on spark. It may solve this problem by the followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like *SendHeartbeat*;
> 2. It had better receive the heartbeat master by *receive* method.  Because 
> any other rpc message may block the *receive* method. Then worker won't 
> receive the heartbeat message. So it had better send the heartbeat master at 
> an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-06 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19831:

Description: 
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like *SendHeartbeat*;

2. It had better not send the heartbeat master by *receive* method.  Because 
any other rpc message may block the *receive* method. Then worker won't receive 
the heartbeat message. So it had better send the heartbeat master at an 
asynchronous timing thread .

  was:
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like *SendHeartbeat*;

2. It had better not send the heartbeat master by Rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .


> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>Priority: Minor
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master because the worker is extend 
> *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
> message *ApplicationFinished*,  master will think the worker is dead. If the 
> worker has a driver, the driver will be scheduled by master again. So I think 
> it is the bug on spark. It may solve this problem by the followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like *SendHeartbeat*;
> 2. It had better not send the heartbeat master by *receive* method.  Because 
> any other rpc message may block the *receive* method. Then worker won't 
> receive the heartbeat message. So it had better send the heartbeat master at 
> an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-06 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898704#comment-15898704
 ] 

hustfxj edited comment on SPARK-19831 at 3/7/17 4:15 AM:
-

[~zsxwing]. I only find the code which handles *ApplicationFinished* message  
is slow . So I also think  such codes should be run in a separate thread.  I 
will submit  a PR which make the codes in a separate thread.


was (Author: hustfxj):
[~zsxwing]. I only find the code which handles *ApplicationFinished* message  
is slow until now. So I also think  such codes should be run in a separate 
thread.

> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>Priority: Minor
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master because the worker is extend 
> *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
> message *ApplicationFinished*,  master will think the worker is dead. If the 
> worker has a driver, the driver will be scheduled by master again. So I think 
> it is the bug on spark. It may solve this problem by the followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like *SendHeartbeat*;
> 2. It had better not send the heartbeat master by Rpc channel. Because any 
> other rpc message may block the rpc channel. It had better send the heartbeat 
> master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19829) The log about driver should support rolling like executor

2017-03-06 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898713#comment-15898713
 ] 

hustfxj commented on SPARK-19829:
-

[~sowen] Yes, this is handled by systems like YARN. But my spark cluster is 
standalone. A standalone cluster should build its own rolling about driver' log 
in default other than defined log4j configuration.

> The log about driver should support rolling like executor
> -
>
> Key: SPARK-19829
> URL: https://issues.apache.org/jira/browse/SPARK-19829
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: hustfxj
>Priority: Minor
>
> We should rollback the log of the driver , or the log maybe large!!! 
> {code:title=DriverRunner.java|borderStyle=solid}
> // modify the runDriver
>   private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: 
> Boolean): Int = {
> builder.directory(baseDir)
> def initialize(process: Process): Unit = {
>   // Redirect stdout and stderr to files-- the old code
> //  val stdout = new File(baseDir, "stdout")
> //  CommandUtils.redirectStream(process.getInputStream, stdout)
> //
> //  val stderr = new File(baseDir, "stderr")
> //  val formattedCommand = builder.command.asScala.mkString("\"", "\" 
> \"", "\"")
> //  val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, 
> "=" * 40)
> //  Files.append(header, stderr, StandardCharsets.UTF_8)
> //  CommandUtils.redirectStream(process.getErrorStream, stderr)
>   // Redirect its stdout and stderr to files-support rolling
>   val stdout = new File(baseDir, "stdout")
>   stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
>   val stderr = new File(baseDir, "stderr")
>   val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", 
> "\"")
>   val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" 
> * 40)
>   Files.append(header, stderr, StandardCharsets.UTF_8)
>   stderrAppender = FileAppender(process.getErrorStream, stderr, conf)
> }
> runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-06 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898704#comment-15898704
 ] 

hustfxj commented on SPARK-19831:
-

[~zsxwing]. I only find the code which handles *ApplicationFinished* message  
is slow until now. So I also think  such codes should be run in a separate 
thread.

> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>Priority: Minor
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master because the worker is extend 
> *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
> message *ApplicationFinished*,  master will think the worker is dead. If the 
> worker has a driver, the driver will be scheduled by master again. So I think 
> it is the bug on spark. It may solve this problem by the followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like *SendHeartbeat*;
> 2. It had better not send the heartbeat master by Rpc channel. Because any 
> other rpc message may block the rpc channel. It had better send the heartbeat 
> master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-05 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19831:

Description: 
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like *SendHeartbeat*;

2. It had better not send the heartbeat master by Rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .

  was:
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .


> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master because the worker is extend 
> *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
> message *ApplicationFinished*,  master will think the worker is dead. If the 
> worker has a driver, the driver will be scheduled by master again. So I think 
> it is the bug on spark. It may solve this problem by the followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like *SendHeartbeat*;
> 2. It had better not send the heartbeat master by Rpc channel. Because any 
> other rpc message may block the rpc channel. It had better send the heartbeat 
> master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-05 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19831:

Description: 
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. It may solve this problem by the followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .

  was:
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. I can solve this problem by followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .


> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master because the worker is extend 
> *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
> message *ApplicationFinished*,  master will think the worker is dead. If the 
> worker has a driver, the driver will be scheduled by master again. So I think 
> it is the bug on spark. It may solve this problem by the followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like SendHeartbeat;
> 2. It had better not send the heartbeat master by rpc channel. Because any 
> other rpc message may block the rpc channel. It had better send the heartbeat 
> master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-05 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19831:

Description: 
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. I can solve this problem by followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .

  was:
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master and rpc messages because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. I can solve this problem by followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .


> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master because the worker is extend 
> *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
> message *ApplicationFinished*,  master will think the worker is dead. If the 
> worker has a driver, the driver will be scheduled by master again. So I think 
> it is the bug on spark. I can solve this problem by followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like SendHeartbeat;
> 2. It had better not send the heartbeat master by rpc channel. Because any 
> other rpc message may block the rpc channel. It had better send the heartbeat 
> master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-05 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19831:

Description: 
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master and rpc messages because the worker is extend 
*ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked  by the 
message *ApplicationFinished*,  master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. I can solve this problem by followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .

  was:
Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master and rpc messages because the worker is extend 
*ThreadSafeRpcEndpoint*. So the master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. I can solve this problem by followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .


> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master and rpc messages because the worker 
> is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker  is blocked 
>  by the message *ApplicationFinished*,  master will think the worker is dead. 
> If the worker has a driver, the driver will be scheduled by master again. So 
> I think it is the bug on spark. I can solve this problem by followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like SendHeartbeat;
> 2. It had better not send the heartbeat master by rpc channel. Because any 
> other rpc message may block the rpc channel. It had better send the heartbeat 
> master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19831) Sending the heartbeat master from worker maybe blocked by other rpc messages

2017-03-05 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19831:

Summary: Sending the heartbeat  master from worker  maybe blocked by other 
rpc messages  (was: Sending the heartbeat to master maybe blocked by other rpc 
messages)

> Sending the heartbeat  master from worker  maybe blocked by other rpc messages
> --
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: hustfxj
>
> Cleaning the application may cost much time at worker, then it will block 
> that  the worker send heartbeats master and rpc messages because the worker 
> is extend *ThreadSafeRpcEndpoint*. So the master will think the worker is 
> dead. If the worker has a driver, the driver will be scheduled by master 
> again. So I think it is the bug on spark. I can solve this problem by 
> followed suggests:
> 1. It had better  put the cleaning the application in a single asynchronous 
> thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
> like SendHeartbeat;
> 2. It had better not send the heartbeat master by rpc channel. Because any 
> other rpc message may block the rpc channel. It had better send the heartbeat 
> master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19831) Sending the heartbeat to master maybe blocked by other rpc messages

2017-03-05 Thread hustfxj (JIRA)
hustfxj created SPARK-19831:
---

 Summary: Sending the heartbeat to master maybe blocked by other 
rpc messages
 Key: SPARK-19831
 URL: https://issues.apache.org/jira/browse/SPARK-19831
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: hustfxj


Cleaning the application may cost much time at worker, then it will block that  
the worker send heartbeats master and rpc messages because the worker is extend 
*ThreadSafeRpcEndpoint*. So the master will think the worker is dead. If the 
worker has a driver, the driver will be scheduled by master again. So I think 
it is the bug on spark. I can solve this problem by followed suggests:

1. It had better  put the cleaning the application in a single asynchronous 
thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages 
like SendHeartbeat;

2. It had better not send the heartbeat master by rpc channel. Because any 
other rpc message may block the rpc channel. It had better send the heartbeat 
master at an asynchronous timing thread .



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19829) The log about driver should support rolling like executor

2017-03-05 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19829:

Description: 
We should rollback the log of the driver , or the log maybe large!!! 

{code:title=DriverRunner.java|borderStyle=solid}
// modify the runDriver
  private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: 
Boolean): Int = {
builder.directory(baseDir)
def initialize(process: Process): Unit = {
  // Redirect stdout and stderr to files-- the old code
//  val stdout = new File(baseDir, "stdout")
//  CommandUtils.redirectStream(process.getInputStream, stdout)
//
//  val stderr = new File(baseDir, "stderr")
//  val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", 
"\"")
//  val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" 
* 40)
//  Files.append(header, stderr, StandardCharsets.UTF_8)
//  CommandUtils.redirectStream(process.getErrorStream, stderr)

  // Redirect its stdout and stderr to files-support rolling
  val stdout = new File(baseDir, "stdout")
  stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
  val stderr = new File(baseDir, "stderr")
  val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", 
"\"")
  val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" * 
40)
  Files.append(header, stderr, StandardCharsets.UTF_8)
  stderrAppender = FileAppender(process.getErrorStream, stderr, conf)
}
runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)
  }
{code}



  was:
We should rollback the log of the driver , or the log maybe large!!! 

{code:title=Bar.java|borderStyle=solid}
// modify the runDriver
  private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: 
Boolean): Int = {
builder.directory(baseDir)
def initialize(process: Process): Unit = {
  // Redirect stdout and stderr to files-- the old code
//  val stdout = new File(baseDir, "stdout")
//  CommandUtils.redirectStream(process.getInputStream, stdout)
//
//  val stderr = new File(baseDir, "stderr")
//  val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", 
"\"")
//  val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" 
* 40)
//  Files.append(header, stderr, StandardCharsets.UTF_8)
//  CommandUtils.redirectStream(process.getErrorStream, stderr)

  // Redirect its stdout and stderr to files-support rolling
  val stdout = new File(baseDir, "stdout")
  stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
  val stderr = new File(baseDir, "stderr")
  val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", 
"\"")
  val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" * 
40)
  Files.append(header, stderr, StandardCharsets.UTF_8)
  stderrAppender = FileAppender(process.getErrorStream, stderr, conf)
}
runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)
  }
{code}




> The log about driver should support rolling like executor
> -
>
> Key: SPARK-19829
> URL: https://issues.apache.org/jira/browse/SPARK-19829
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: hustfxj
>
> We should rollback the log of the driver , or the log maybe large!!! 
> {code:title=DriverRunner.java|borderStyle=solid}
> // modify the runDriver
>   private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: 
> Boolean): Int = {
> builder.directory(baseDir)
> def initialize(process: Process): Unit = {
>   // Redirect stdout and stderr to files-- the old code
> //  val stdout = new File(baseDir, "stdout")
> //  CommandUtils.redirectStream(process.getInputStream, stdout)
> //
> //  val stderr = new File(baseDir, "stderr")
> //  val formattedCommand = builder.command.asScala.mkString("\"", "\" 
> \"", "\"")
> //  val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, 
> "=" * 40)
> //  Files.append(header, stderr, StandardCharsets.UTF_8)
> //  CommandUtils.redirectStream(process.getErrorStream, stderr)
>   // Redirect its stdout and stderr to files-support rolling
>   val stdout = new File(baseDir, "stdout")
>   stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
>   val stderr = new File(baseDir, "stderr")
>   val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", 
> "\"")
>   val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" 
> * 40)
>   Files.append(header, stderr, StandardCharsets.UTF_8)
>   stderrAppender = 

[jira] [Created] (SPARK-19829) The log about driver should support rolling like executor

2017-03-05 Thread hustfxj (JIRA)
hustfxj created SPARK-19829:
---

 Summary: The log about driver should support rolling like executor
 Key: SPARK-19829
 URL: https://issues.apache.org/jira/browse/SPARK-19829
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: hustfxj


We should rollback the log of the driver , or the log maybe large!!! 

{code:title=Bar.java|borderStyle=solid}
// modify the runDriver
  private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: 
Boolean): Int = {
builder.directory(baseDir)
def initialize(process: Process): Unit = {
  // Redirect stdout and stderr to files-- the old code
//  val stdout = new File(baseDir, "stdout")
//  CommandUtils.redirectStream(process.getInputStream, stdout)
//
//  val stderr = new File(baseDir, "stderr")
//  val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", 
"\"")
//  val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" 
* 40)
//  Files.append(header, stderr, StandardCharsets.UTF_8)
//  CommandUtils.redirectStream(process.getErrorStream, stderr)

  // Redirect its stdout and stderr to files-support rolling
  val stdout = new File(baseDir, "stdout")
  stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
  val stderr = new File(baseDir, "stderr")
  val formattedCommand = builder.command.asScala.mkString("\"", "\" \"", 
"\"")
  val header = "Launch Command: %s\n%s\n\n".format(formattedCommand, "=" * 
40)
  Files.append(header, stderr, StandardCharsets.UTF_8)
  stderrAppender = FileAppender(process.getErrorStream, stderr, conf)
}
runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)
  }
{code}





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-18 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828402#comment-15828402
 ] 

hustfxj commented on SPARK-19264:
-

@Sean Owen Why not solve it like AM of Yarn. I remember the applicationMaster 
monitors the user's program main thread, If the main thread quit, then the 
AM will finish. As follows:
{code:title=ApplicationMaster.scala|borderStyle=solid}   
private def runDriver(securityMgr: SecurityManager): Unit = {
addAmIpFilter()
userClassThread = startUserApplication()

// This a bit hacky, but we need to wait until the spark.driver.port 
property has
// been set by the Thread executing the user class.
logInfo("Waiting for spark context initialization...")
val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME)
try {
  val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
Duration(totalWaitTime, TimeUnit.MILLISECONDS))
  if (sc != null) {
rpcEnv = sc.env.rpcEnv
val driverRef = runAMEndpoint(
  sc.getConf.get("spark.driver.host"),
  sc.getConf.get("spark.driver.port"),
  isClusterMode = true)
registerAM(sc.getConf, rpcEnv, driverRef, 
sc.ui.map(_.webUrl).getOrElse(""),
  securityMgr)
  } else {
// Sanity check; should never happen in normal operation, since sc 
should only be null
// if the user app did not create a SparkContext.
if (!finished) {
  throw new IllegalStateException("SparkContext is null but app is 
still running!")
}
  }
  userClassThread.join()
} catch {
  case e: SparkException if e.getCause().isInstanceOf[TimeoutException] =>
logError(
  s"SparkContext did not initialize after waiting for $totalWaitTime 
ms. " +
   "Please check earlier log output for errors. Failing the 
application.")
finish(FinalApplicationStatus.FAILED,
  ApplicationMaster.EXIT_SC_NOT_INITED,
  "Timed out waiting for SparkContext.")
}
  }
{code}



> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some non-daemon threads. Because the program terminates 
> when there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-18 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828366#comment-15828366
 ] 

hustfxj commented on SPARK-19264:
-

 Maybe you are right.  We can't hard-kill the driver. But I don't think it is a 
good design. I still think the driver should terminate if the main thread quit. 
Maybe we should make user know the issue on the documentation if the user's 
program has spawned non-daemon threads. Thank you.

> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some non-daemon threads. Because the program terminates 
> when there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827201#comment-15827201
 ] 

hustfxj edited comment on SPARK-19264 at 1/18/17 1:39 AM:
--

So I must change the last  code like that
{code}
try {
  ssc.awaitTermination()
} finally {
System.exit(-1);
  }
{code}

I think the user shouldn't care this issue.Since the spark provide the 
awaitTermination(), thus we should make the driver quit  the main thread done 
whether the driver still contains the unfinished non-daemon theads or not.



was (Author: hustfxj):
So I must change the last  code like that
{code}
try {
  ssc.awaitTermination()
} finally {
System.exit(-1);
  }
{code}

> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some non-daemon threads. Because the program terminates 
> when there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827201#comment-15827201
 ] 

hustfxj commented on SPARK-19264:
-

So I must change the last  code like that
{code}
try {
  ssc.awaitTermination()
} finally {
System.exit(-1);
  }
{code}

> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some non-daemon threads. Because the program terminates 
> when there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827181#comment-15827181
 ] 

hustfxj edited comment on SPARK-19264 at 1/18/17 1:18 AM:
--

[~srowen] sorry, I didn't say it clearly. I means the driver can't be done when 
it contains other unfinished non-daemon threads. Look at the followed example. 
the driver program should crash due to the exception. But In fact the driver 
program can't crash because the timer threads still are runnning.
{code:title=Test.scala|borderStyle=solid}   
   val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.streaming.blockInterval", "1000ms")
val ssc = new StreamingContext(sparkConf, Seconds(10))
//non-daemon thread
val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
  new ThreadFactory() {
def newThread(r: Runnable): Thread = new Thread(r, 
"Driver-Commit-Thread")
  })
scheduledExecutorService.scheduleAtFixedRate(
  new Runnable() {
def run() {
  try {
System.out.println("runable")
  } catch {
case e: Exception => {
  System.out.println("ScheduledTask persistAllConsumerOffset 
exception", e)
}
  }
}
  }, 1000, 1000 * 5, TimeUnit.MILLISECONDS)
val lines = ssc.receiverStream(new 
WordReceiver(StorageLevel.MEMORY_AND_DISK_2))
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + 
y, 10)
wordCounts.foreachRDD{rdd =>
  rdd.collect().foreach(println)
  throw new RuntimeException
}
ssc.start()
ssc.awaitTermination()
{code}



was (Author: hustfxj):
[~srowen] sorry, I didn't say it clearly. I means the spark application can't 
be done when it contains other unfinished non-daemon. Look at the followed 
example. the driver program should crash due to the exception. But In fact the 
driver program can't crash because the timer threads still are runnning.
{code:title=Test.scala|borderStyle=solid}   
   val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.streaming.blockInterval", "1000ms")
val ssc = new StreamingContext(sparkConf, Seconds(10))
//non-daemon thread
val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
  new ThreadFactory() {
def newThread(r: Runnable): Thread = new Thread(r, 
"Driver-Commit-Thread")
  })
scheduledExecutorService.scheduleAtFixedRate(
  new Runnable() {
def run() {
  try {
System.out.println("runable")
  } catch {
case e: Exception => {
  System.out.println("ScheduledTask persistAllConsumerOffset 
exception", e)
}
  }
}
  }, 1000, 1000 * 5, TimeUnit.MILLISECONDS)
val lines = ssc.receiverStream(new 
WordReceiver(StorageLevel.MEMORY_AND_DISK_2))
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + 
y, 10)
wordCounts.foreachRDD{rdd =>
  rdd.collect().foreach(println)
  throw new RuntimeException
}
ssc.start()
ssc.awaitTermination()
{code}


> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some non-daemon threads. Because the program terminates 
> when there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827181#comment-15827181
 ] 

hustfxj edited comment on SPARK-19264 at 1/18/17 1:16 AM:
--

[~srowen] sorry, I didn't say it clearly. I means the spark application can't 
be done when it contains other unfinished non-daemon. Look at the followed 
example. the driver program should crash due to the exception. But In fact the 
driver program can't crash because the timer threads still are runnning.
{code:title=Test.scala|borderStyle=solid}   
   val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.streaming.blockInterval", "1000ms")
val ssc = new StreamingContext(sparkConf, Seconds(10))
//non-daemon thread
val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
  new ThreadFactory() {
def newThread(r: Runnable): Thread = new Thread(r, 
"Driver-Commit-Thread")
  })
scheduledExecutorService.scheduleAtFixedRate(
  new Runnable() {
def run() {
  try {
System.out.println("runable")
  } catch {
case e: Exception => {
  System.out.println("ScheduledTask persistAllConsumerOffset 
exception", e)
}
  }
}
  }, 1000, 1000 * 5, TimeUnit.MILLISECONDS)
val lines = ssc.receiverStream(new 
WordReceiver(StorageLevel.MEMORY_AND_DISK_2))
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + 
y, 10)
wordCounts.foreachRDD{rdd =>
  rdd.collect().foreach(println)
  throw new RuntimeException
}
ssc.start()
ssc.awaitTermination()
{code}



was (Author: hustfxj):
[~srowen] sorry, I didn't say it clearly. I means the spark application can't 
be done when it contains other unfinished non-daemon. Look at the follows 
example. the driver program should crash due to the exception. But In fact the 
driver program can't crash because the timer threads still are runnning.
{code:title=Test.scala|borderStyle=solid}   
   val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.streaming.blockInterval", "1000ms")
val ssc = new StreamingContext(sparkConf, Seconds(10))
//non-daemon thread
val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
  new ThreadFactory() {
def newThread(r: Runnable): Thread = new Thread(r, 
"Driver-Commit-Thread")
  })
scheduledExecutorService.scheduleAtFixedRate(
  new Runnable() {
def run() {
  try {
System.out.println("runable")
  } catch {
case e: Exception => {
  System.out.println("ScheduledTask persistAllConsumerOffset 
exception", e)
}
  }
}
  }, 1000, 1000 * 5, TimeUnit.MILLISECONDS)
val lines = ssc.receiverStream(new 
WordReceiver(StorageLevel.MEMORY_AND_DISK_2))
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + 
y, 10)
wordCounts.foreachRDD{rdd =>
  rdd.collect().foreach(println)
  throw new RuntimeException
}
ssc.start()
ssc.awaitTermination()
{code}


> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some non-daemon threads. Because the program terminates 
> when there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827181#comment-15827181
 ] 

hustfxj edited comment on SPARK-19264 at 1/18/17 1:16 AM:
--

[~srowen] sorry, I didn't say it clearly. I means the spark application can't 
be done when it contains other unfinished non-daemon. Look at the follows 
example. the driver program should crash due to the exception. But In fact the 
driver program can't crash because the timer threads still are runnning.
{code:title=Test.scala|borderStyle=solid}   
   val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.streaming.blockInterval", "1000ms")
val ssc = new StreamingContext(sparkConf, Seconds(10))
//non-daemon thread
val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
  new ThreadFactory() {
def newThread(r: Runnable): Thread = new Thread(r, 
"Driver-Commit-Thread")
  })
scheduledExecutorService.scheduleAtFixedRate(
  new Runnable() {
def run() {
  try {
System.out.println("runable")
  } catch {
case e: Exception => {
  System.out.println("ScheduledTask persistAllConsumerOffset 
exception", e)
}
  }
}
  }, 1000, 1000 * 5, TimeUnit.MILLISECONDS)
val lines = ssc.receiverStream(new 
WordReceiver(StorageLevel.MEMORY_AND_DISK_2))
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + 
y, 10)
wordCounts.foreachRDD{rdd =>
  rdd.collect().foreach(println)
  throw new RuntimeException
}
ssc.start()
ssc.awaitTermination()
{code}



was (Author: hustfxj):
[~srowen] sorry, I didn't say it clearly. I means an spark application can't be 
done when it contains other unfinished non-daemon. for example
{code:title=Test.scala|borderStyle=solid}   
   val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.streaming.blockInterval", "1000ms")
val ssc = new StreamingContext(sparkConf, Seconds(10))
//non-daemon thread
val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
  new ThreadFactory() {
def newThread(r: Runnable): Thread = new Thread(r, 
"Driver-Commit-Thread")
  })
scheduledExecutorService.scheduleAtFixedRate(
  new Runnable() {
def run() {
  try {
System.out.println("runable")
  } catch {
case e: Exception => {
  System.out.println("ScheduledTask persistAllConsumerOffset 
exception", e)
}
  }
}
  }, 1000, 1000 * 5, TimeUnit.MILLISECONDS)
val lines = ssc.receiverStream(new 
WordReceiver(StorageLevel.MEMORY_AND_DISK_2))
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + 
y, 10)
wordCounts.foreachRDD{rdd =>
  rdd.collect().foreach(println)
  throw new RuntimeException
}
ssc.start()
ssc.awaitTermination()
{code}



> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some non-daemon threads. Because the program terminates 
> when there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827181#comment-15827181
 ] 

hustfxj edited comment on SPARK-19264 at 1/18/17 1:12 AM:
--

[~srowen] sorry, I didn't say it clearly. I means an spark application can't be 
done when it contains other unfinished non-daemon. for example
{code:title=Test.scala|borderStyle=solid}   
   val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.streaming.blockInterval", "1000ms")
val ssc = new StreamingContext(sparkConf, Seconds(10))
//non-daemon thread
val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
  new ThreadFactory() {
def newThread(r: Runnable): Thread = new Thread(r, 
"Driver-Commit-Thread")
  })
scheduledExecutorService.scheduleAtFixedRate(
  new Runnable() {
def run() {
  try {
System.out.println("runable")
  } catch {
case e: Exception => {
  System.out.println("ScheduledTask persistAllConsumerOffset 
exception", e)
}
  }
}
  }, 1000, 1000 * 5, TimeUnit.MILLISECONDS)
val lines = ssc.receiverStream(new 
WordReceiver(StorageLevel.MEMORY_AND_DISK_2))
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + 
y, 10)
wordCounts.foreachRDD{rdd =>
  rdd.collect().foreach(println)
  throw new RuntimeException
}
ssc.start()
ssc.awaitTermination()
{code}




was (Author: hustfxj):
[~srowen] sorry, I didn't say it clearly. I means an spark application can't be 
done when it contain other unfinished non-daemon. for example
{code:title=Test.scala|borderStyle=solid}   
   val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.streaming.blockInterval", "1000ms")
val ssc = new StreamingContext(sparkConf, Seconds(10))
//non-daemon thread
val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
  new ThreadFactory() {
def newThread(r: Runnable): Thread = new Thread(r, 
"Driver-Commit-Thread")
  })
scheduledExecutorService.scheduleAtFixedRate(
  new Runnable() {
def run() {
  try {
System.out.println("runable")
  } catch {
case e: Exception => {
  System.out.println("ScheduledTask persistAllConsumerOffset 
exception", e)
}
  }
}
  }, 1000, 1000 * 5, TimeUnit.MILLISECONDS)
val lines = ssc.receiverStream(new 
WordReceiver(StorageLevel.MEMORY_AND_DISK_2))
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + 
y, 10)
wordCounts.foreachRDD{rdd =>
  rdd.collect().foreach(println)
  throw new RuntimeException
}
ssc.start()
ssc.awaitTermination()
{code}



> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some non-daemon threads. Because the program terminates 
> when there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827181#comment-15827181
 ] 

hustfxj commented on SPARK-19264:
-

[~srowen] sorry, I didn't say it clearly. I means an spark application can't be 
done when it contain other unfinished non-daemon. for example
{code:title=Test.scala|borderStyle=solid}   
   val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.streaming.blockInterval", "1000ms")
val ssc = new StreamingContext(sparkConf, Seconds(10))
//non-daemon thread
val scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
  new ThreadFactory() {
def newThread(r: Runnable): Thread = new Thread(r, 
"Driver-Commit-Thread")
  })
scheduledExecutorService.scheduleAtFixedRate(
  new Runnable() {
def run() {
  try {
System.out.println("runable")
  } catch {
case e: Exception => {
  System.out.println("ScheduledTask persistAllConsumerOffset 
exception", e)
}
  }
}
  }, 1000, 1000 * 5, TimeUnit.MILLISECONDS)
val lines = ssc.receiverStream(new 
WordReceiver(StorageLevel.MEMORY_AND_DISK_2))
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey((x: Int, y: Int) => x + 
y, 10)
wordCounts.foreachRDD{rdd =>
  rdd.collect().foreach(println)
  throw new RuntimeException
}
ssc.start()
ssc.awaitTermination()
{code}



> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some non-daemon threads. Because the program terminates 
> when there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19264:

Description: 
  I think work can't start driver by "ProcessBuilderLike",  thus we can't know 
the application's main thread is finished or not if the application's main 
thread contains some non-daemon threads. Because the program terminates when 
there no longer is any non-daemon thread running (or someone called 
System.exit). The main thread can have finished long ago. 
worker should  start driver like AM of YARN . As followed:

{code:title=ApplicationMaster.scala|borderStyle=solid}
 mainMethod.invoke(null, userArgs.toArray)
 finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
 logDebug("Done running users class")
{code}

Then the work can monitor the driver's main thread, and know the application's 
state. 

  was:
  I think work can't start driver by "ProcessBuilderLike",  thus we can't know 
the application's main thread is finished or not if the application's main 
thread contains some daemon threads. Because the program terminates when there 
no longer is any non-daemon thread running (or someone called System.exit). The 
main thread can have finished long ago. 
worker should  start driver like AM of YARN . As followed:

{code:title=ApplicationMaster.scala|borderStyle=solid}
 mainMethod.invoke(null, userArgs.toArray)
 finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
 logDebug("Done running users class")
{code}

Then the work can monitor the driver's main thread, and know the application's 
state. 


> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some non-daemon threads. Because the program terminates 
> when there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19264:

Description: 
  I think work can't start driver by "ProcessBuilderLike",  thus we can't know 
the application's main thread is finished or not if the application's main 
thread contains some daemon threads. Because the program terminates when there 
no longer is any non-daemon thread running (or someone called System.exit). The 
main thread can have finished long ago. 
worker should  start driver like AM of YARN . As followed:

{code:title=ApplicationMaster.scala|borderStyle=solid}
 mainMethod.invoke(null, userArgs.toArray)
 finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
 logDebug("Done running users class")
{code}

Then the work can monitor the driver's main thread, and know the application's 
state. 

  was:
  I think work can't start driver by "ProcessBuilderLike",  thus we can't know 
the application's main thread is finished or not if the application's main 
thread contains some daemon threads. Because the program terminates when there 
no longer is any non-daemon thread running (or someone called System.exit). The 
main thread can have finished long ago. 
worker should  start driver like AM of YARN . As followed:

{code:title=Bar.java|borderStyle=solid}
 mainMethod.invoke(null, userArgs.toArray)
 finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
 logDebug("Done running users class")
{code}

Then the work can monitor the driver's main thread, and know the application's 
state. 


> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some daemon threads. Because the program terminates when 
> there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=ApplicationMaster.scala|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19264:

Description: 
  I think work can't start driver by "ProcessBuilderLike",  thus we can't know 
the application's main thread is finished or not if the application's main 
thread contains some daemon threads. Because the program terminates when there 
no longer is any non-daemon thread running (or someone called System.exit). The 
main thread can have finished long ago. 
worker should  start driver like AM of YARN . As followed:

{code:title=Bar.java|borderStyle=solid}
 mainMethod.invoke(null, userArgs.toArray)
 finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
 logDebug("Done running users class")
{code}

Then the work can monitor the driver's main thread, and know the application's 
state. 

  was:
  I think work can't start driver by "ProcessBuilderLike",  thus we can't know 
the application's main thread is finished or not if the application's main 
thread contains some daemon threads. Because the program terminates when there 
no longer is any non-daemon thread running (or someone called System.exit). The 
main thread can have finished long ago. 

worker should  start driver like AM of YARN . As followed:

```mainMethod.invoke(null, userArgs.toArray)
 finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
 logDebug("Done running users class")```

Then the work can monitor the driver's main thread, and know the application's 
state. 


> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some daemon threads. Because the program terminates when 
> there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> {code:title=Bar.java|borderStyle=solid}
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> {code}
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19264:

Description: 
  I think work can't start driver by "ProcessBuilderLike",  thus we can't know 
the application's main thread is finished or not if the application's main 
thread contains some daemon threads. Because the program terminates when there 
no longer is any non-daemon thread running (or someone called System.exit). The 
main thread can have finished long ago. 

worker should  start driver like AM of YARN . As followed:

```
 mainMethod.invoke(null, userArgs.toArray)
 finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
 logDebug("Done running users class")
```

Then the work can monitor the driver's main thread, and know the application's 
state. 

  was:
  I think work can't start driver by "ProcessBuilderLike",  thus we can't know 
the application's main thread is finished or not if the application's main 
thread contains some daemon threads. Because the program terminates when there 
no longer is any non-daemon thread running (or someone called System.exit). The 
main thread can have finished long ago. 

worker should  start driver like AM of YARN . As followed:

```
  mainMethod.invoke(null, userArgs.toArray)
 finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
 logDebug("Done running users class")
```
Then the work can monitor the driver's main thread, and know the application's 
state. 


> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some daemon threads. Because the program terminates when 
> there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> 
> ```
>  mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> ```
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19264) Work should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19264:

Summary: Work should start driver, the same to  AM  of yarn   (was: Work 
maybe should start driver, the same to  AM  of yarn )

> Work should start driver, the same to  AM  of yarn 
> ---
>
> Key: SPARK-19264
> URL: https://issues.apache.org/jira/browse/SPARK-19264
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
>   I think work can't start driver by "ProcessBuilderLike",  thus we can't 
> know the application's main thread is finished or not if the application's 
> main thread contains some daemon threads. Because the program terminates when 
> there no longer is any non-daemon thread running (or someone called 
> System.exit). The main thread can have finished long ago. 
> worker should  start driver like AM of YARN . As followed:
> 
> ```
>   mainMethod.invoke(null, userArgs.toArray)
>  finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
>  logDebug("Done running users class")
> ```
> Then the work can monitor the driver's main thread, and know the 
> application's state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19264) Work maybe should start driver, the same to AM of yarn

2017-01-17 Thread hustfxj (JIRA)
hustfxj created SPARK-19264:
---

 Summary: Work maybe should start driver, the same to  AM  of yarn 
 Key: SPARK-19264
 URL: https://issues.apache.org/jira/browse/SPARK-19264
 Project: Spark
  Issue Type: Improvement
Reporter: hustfxj


  I think work can't start driver by "ProcessBuilderLike",  thus we can't know 
the application's main thread is finished or not if the application's main 
thread contains some daemon threads. Because the program terminates when there 
no longer is any non-daemon thread running (or someone called System.exit). The 
main thread can have finished long ago. 

worker should  start driver like AM of YARN . As followed:

```
  mainMethod.invoke(null, userArgs.toArray)
 finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
 logDebug("Done running users class")
```
Then the work can monitor the driver's main thread, and know the application's 
state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19052) the rest api don't support multiple standby masters on standalone cluster

2017-01-02 Thread hustfxj (JIRA)
hustfxj created SPARK-19052:
---

 Summary: the rest api don't support multiple standby masters on 
standalone cluster
 Key: SPARK-19052
 URL: https://issues.apache.org/jira/browse/SPARK-19052
 Project: Spark
  Issue Type: Bug
Reporter: hustfxj


The driver only know a master's address which come from StandaloneRestServer's 
masterUrl If submitting the job by rest api.So we should give priority to set 
"spark.master" by "sparkProperties".




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18959) invalid resource statistics for standalone cluster

2017-01-01 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15791056#comment-15791056
 ] 

hustfxj commented on SPARK-18959:
-

Yes, I have a fix. You can see the link 
https://github.com/apache/spark/pull/16446

> invalid resource statistics for standalone cluster
> --
>
> Key: SPARK-18959
> URL: https://issues.apache.org/jira/browse/SPARK-18959
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: hustfxj
>Priority: Minor
> Attachments: 屏幕快照 2016-12-21 11.49.12.png
>
>
> Workers
> Worker Id Address State   Cores   Memory
> worker-20161220162751-10.125.6.222-59295  10.125.6.222:59295  ALIVE   
> 4 (-1 Used) 6.8 GB (-1073741824.0 B Used)
> worker-20161220164233-10.218.135.80-10944 10.218.135.80:10944 ALIVE   
> 4 (0 Used)  6.8 GB (0.0 B Used)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19042) Remove query string from jar url for executor

2016-12-31 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15789488#comment-15789488
 ] 

hustfxj commented on SPARK-19042:
-

https://github.com/apache/spark/pull/16443


> Remove query string from jar url for executor
> -
>
> Key: SPARK-19042
> URL: https://issues.apache.org/jira/browse/SPARK-19042
> Project: Spark
>  Issue Type: Bug
>Reporter: hustfxj
>
> spark.jars support jar url with http protocal. However, if the url contains 
> any query strings, the "localName = name.split("/").last" won't get the 
> expected jar, then "val url = new File(SparkFiles.getRootDirectory(), 
> localName).toURI.toURL" will get invalid url. The bug fix is the same as 
> [SPARK-17855]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19042) Remove query string from jar url for executor

2016-12-31 Thread hustfxj (JIRA)
hustfxj created SPARK-19042:
---

 Summary: Remove query string from jar url for executor
 Key: SPARK-19042
 URL: https://issues.apache.org/jira/browse/SPARK-19042
 Project: Spark
  Issue Type: Bug
Reporter: hustfxj


spark.jars support jar url with http protocal. However, if the url contains any 
query strings, the "localName = name.split("/").last" won't get the expected 
jar, then "val url = new File(SparkFiles.getRootDirectory(), 
localName).toURI.toURL" will get invalid url. The bug fix is the same as 
[SPARK-17855]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster

2016-12-27 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19011:

Description: A large standalone cluster may have logs of applications and 
drivers, So I may not know which driver start up which application from the 
masterPage . So I think we can add the driver id at the applicationDescription. 
In fact, I have finished!

> ApplicationDescription should add the Submission ID for the standalone cluster
> --
>
> Key: SPARK-19011
> URL: https://issues.apache.org/jira/browse/SPARK-19011
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
> A large standalone cluster may have logs of applications and drivers, So I 
> may not know which driver start up which application from the masterPage . So 
> I think we can add the driver id at the applicationDescription. In fact, I 
> have finished!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster

2016-12-27 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780310#comment-15780310
 ] 

hustfxj commented on SPARK-19011:
-

A large standalone cluster may have logs of applications and drivers, So I may 
not know which driver start up which application from the masterPage . So I 
think we can add the driver id at the applicationDescription. In fact, I have 
finished! 

> ApplicationDescription should add the Submission ID for the standalone cluster
> --
>
> Key: SPARK-19011
> URL: https://issues.apache.org/jira/browse/SPARK-19011
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster

2016-12-27 Thread hustfxj (JIRA)
hustfxj created SPARK-19011:
---

 Summary: ApplicationDescription should add the Submission ID for 
the standalone cluster
 Key: SPARK-19011
 URL: https://issues.apache.org/jira/browse/SPARK-19011
 Project: Spark
  Issue Type: Improvement
Reporter: hustfxj






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18959) invalid resource statistics for standalone cluster

2016-12-21 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768696#comment-15768696
 ] 

hustfxj commented on SPARK-18959:
-

2.2.0-SNAPSHOT

> invalid resource statistics for standalone cluster
> --
>
> Key: SPARK-18959
> URL: https://issues.apache.org/jira/browse/SPARK-18959
> Project: Spark
>  Issue Type: Bug
>Reporter: hustfxj
> Attachments: 屏幕快照 2016-12-21 11.49.12.png
>
>
> Workers
> Worker Id Address State   Cores   Memory
> worker-20161220162751-10.125.6.222-59295  10.125.6.222:59295  ALIVE   
> 4 (-1 Used) 6.8 GB (-1073741824.0 B Used)
> worker-20161220164233-10.218.135.80-10944 10.218.135.80:10944 ALIVE   
> 4 (0 Used)  6.8 GB (0.0 B Used)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18959) invalid resource statistics for standalone cluster

2016-12-20 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-18959:

Attachment: 屏幕快照 2016-12-21 11.49.12.png

The attachment is the master page

> invalid resource statistics for standalone cluster
> --
>
> Key: SPARK-18959
> URL: https://issues.apache.org/jira/browse/SPARK-18959
> Project: Spark
>  Issue Type: Bug
>Reporter: hustfxj
> Attachments: 屏幕快照 2016-12-21 11.49.12.png
>
>
> Workers
> Worker Id Address State   Cores   Memory
> worker-20161220162751-10.125.6.222-59295  10.125.6.222:59295  ALIVE   
> 4 (-1 Used) 6.8 GB (-1073741824.0 B Used)
> worker-20161220164233-10.218.135.80-10944 10.218.135.80:10944 ALIVE   
> 4 (0 Used)  6.8 GB (0.0 B Used)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18959) invalid resource statistics for standalone cluster

2016-12-20 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766020#comment-15766020
 ] 

hustfxj commented on SPARK-18959:
-

the attachment is the master page 

> invalid resource statistics for standalone cluster
> --
>
> Key: SPARK-18959
> URL: https://issues.apache.org/jira/browse/SPARK-18959
> Project: Spark
>  Issue Type: Bug
>Reporter: hustfxj
>
> Workers
> Worker Id Address State   Cores   Memory
> worker-20161220162751-10.125.6.222-59295  10.125.6.222:59295  ALIVE   
> 4 (-1 Used) 6.8 GB (-1073741824.0 B Used)
> worker-20161220164233-10.218.135.80-10944 10.218.135.80:10944 ALIVE   
> 4 (0 Used)  6.8 GB (0.0 B Used)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18959) invalid resource statistics for standalone cluster

2016-12-20 Thread hustfxj (JIRA)
hustfxj created SPARK-18959:
---

 Summary: invalid resource statistics for standalone cluster
 Key: SPARK-18959
 URL: https://issues.apache.org/jira/browse/SPARK-18959
 Project: Spark
  Issue Type: Bug
Reporter: hustfxj



Workers

Worker Id   Address State   Cores   Memory
worker-20161220162751-10.125.6.222-5929510.125.6.222:59295  ALIVE   
4 (-1 Used) 6.8 GB (-1073741824.0 B Used)
worker-20161220164233-10.218.135.80-10944   10.218.135.80:10944 ALIVE   
4 (0 Used)  6.8 GB (0.0 B Used)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-18706) Can spark support exactly once based kafka ? Due to these following question.

2016-12-04 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj closed SPARK-18706.
---
Resolution: Duplicate

> Can spark  support exactly once based kafka ? Due to these following question.
> --
>
> Key: SPARK-18706
> URL: https://issues.apache.org/jira/browse/SPARK-18706
> Project: Spark
>  Issue Type: Question
>Reporter: hustfxj
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18706) Can spark support exactly once based kafka ? Due to these following question.

2016-12-04 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720087#comment-15720087
 ] 

hustfxj commented on SPARK-18706:
-

1. If a task complete the operation, it will notify driver. The driver may not 
receive the message due to the network, and think the task is still running. 
Then the child stage won't be scheduled ?
2. how do spark guarantee the downstream-task  can receive the shuffle-data 
completely. As fact, I can't find the checksum for blocks in spark. For 
example, the upstream-task may shuffle 100Mb data, but the downstream-task may 
receive 99Mb data due to network. Can spark verify the data is received 
completely based size ?

> Can spark  support exactly once based kafka ? Due to these following question.
> --
>
> Key: SPARK-18706
> URL: https://issues.apache.org/jira/browse/SPARK-18706
> Project: Spark
>  Issue Type: Question
>Reporter: hustfxj
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18706) Can spark support exactly once based kafka ? Due to these following question.

2016-12-04 Thread hustfxj (JIRA)
hustfxj created SPARK-18706:
---

 Summary: Can spark  support exactly once based kafka ? Due to 
these following question.
 Key: SPARK-18706
 URL: https://issues.apache.org/jira/browse/SPARK-18706
 Project: Spark
  Issue Type: Question
Reporter: hustfxj






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18707) Can spark support exactly once based kafka ? Due to these following question?

2016-12-04 Thread hustfxj (JIRA)
hustfxj created SPARK-18707:
---

 Summary: Can spark  support exactly once based kafka ? Due to 
these following question?
 Key: SPARK-18707
 URL: https://issues.apache.org/jira/browse/SPARK-18707
 Project: Spark
  Issue Type: Question
Reporter: hustfxj


1. If a task complete the operation, it will notify driver. The driver may not 
receive the message due to the network, and think the task is still running. 
Then the child stage won't be scheduled ?
2. how do spark guarantee the downstream-task  can receive the shuffle-data 
completely. As fact, I can't find the checksum for blocks in spark. For 
example, the upstream-task may shuffle 100Mb data, but the downstream-task may 
receive 99Mb data due to network. Can spark verify the data is received 
completely based size ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18573) when restart yarn cluster, the spark streaming job will lost ?

2016-11-23 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691986#comment-15691986
 ] 

hustfxj commented on SPARK-18573:
-

when I restart yarn cluster,  the spark streaming jobs  will lost . So I will 
resubmit all streaming jobs. how can we assure the streaming jobs won't lost 
while I restart the yarn cluster ?

> when restart yarn cluster,  the spark streaming job will lost ?
> ---
>
> Key: SPARK-18573
> URL: https://issues.apache.org/jira/browse/SPARK-18573
> Project: Spark
>  Issue Type: Question
>Reporter: hustfxj
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18573) when restart yarn cluster, the spark streaming job will lost ?

2016-11-23 Thread hustfxj (JIRA)
hustfxj created SPARK-18573:
---

 Summary: when restart yarn cluster,  the spark streaming job will 
lost ?
 Key: SPARK-18573
 URL: https://issues.apache.org/jira/browse/SPARK-18573
 Project: Spark
  Issue Type: Question
Reporter: hustfxj






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org