[GitHub] spark pull request: [SPARK-2301] add ability to submit multiple ja...

2014-09-16 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1113#issuecomment-55746200 @andrewor14 i update it with your comments.thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2301] add ability to submit multiple ja...

2014-09-16 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1113#issuecomment-55842735 @vanzin @andrewor14 i think using spark-submit --jars path must be on local slave node. if path is on hdfs, it cannot download from hdfs to local path on slave

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-22 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/1528 use config spark.scheduler.priority for specifying TaskSet's priority on DAGScheduler https://issues.apache.org/jira/browse/SPARK-2618 You can merge this pull request into a Git repository

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-22 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1528#issuecomment-49756393 It add user defined priority to FIFO. If user don not configure priority, it work as before. It is non-preemptive.when there has free executors we can let high

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-22 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1528#issuecomment-49830100 @markhamstra @CodingCat thank you for comments, i updated patch, please review again. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-22 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1528#issuecomment-49833109 @markhamstra thank you. i update patch. have more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1528#issuecomment-49846832 @markhamstra thank you.how about latest code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1528#issuecomment-49849632 @markhamstra @pwendell i have updated SPARK-2618, please take a look. thanks --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-2298: Show stage attempt in UI

2014-07-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1384#issuecomment-49860984 i think you can add jobid to stageTable. because jobid is very useful when a application has many jobs.that can distinguish every job's stages. --- If your project

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1528#issuecomment-49862687 i donot think priority is useful for FAIR scheduler. on YARN scheduler priority is work with FIFO and not with FAIR. so i think spark application's scheduler mode

[GitHub] spark pull request: use config spark.scheduler.priority for specif...

2014-07-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1528#issuecomment-49864527 maybe i misunderstand you. with FAIR Schedulable.weight can replace priority. you mean with FAIR we can provide weight config to user? example:spark.scheduler.weight

[GitHub] spark pull request: [SPARK-2298] Encode stage attempt in SparkList...

2014-07-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1545#issuecomment-49869232 i think we can add jobid to stageTable. because jobid is very useful when a application has many jobs.that can distinguish every job's stages. --- If your project

[GitHub] spark pull request: Spark 2037: yarn client mode doesn't support s...

2014-07-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1180#issuecomment-49883664 i think a long-running application,sometimes there has maxNumExecutorFailures because yarn's reason, but yarn quickly provide spark to new container.although

[GitHub] spark pull request: Spark 2037: yarn client mode doesn't support s...

2014-07-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1180#issuecomment-49886521 @tgravescs i think if yarn will give application more executors, application will continue work and it donot need maxNumExecutorFailures. i think

[GitHub] spark pull request: [SPARK-2037]: yarn client mode doesn't support...

2014-07-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1180#issuecomment-49890478 thank you.but i think some errors like disk failures,machines go down cannot include it.so maybe we need to exclude some errors that yarn report.how about your idea

[GitHub] spark pull request: [SPARK-2037]: yarn client mode doesn't support...

2014-07-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1180#issuecomment-49892019 i know. i am ok for this PR.thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: through shuffling blocksByAddress avoid much r...

2014-07-23 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/1549 through shuffling blocksByAddress avoid much reducers to fetch data from a executor at a time like mapreduce we need to shuffle blocksByAddress.it can avoid many reducers to connect a executor

[GitHub] spark pull request: [SPARK-2666] when task failed with FetchFailed...

2014-07-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1572#issuecomment-50126087 ok, i understand your idea.the current implementation is let the remaning tasks run.but that has a problem if one of remaning tasks is writing hdfs and other new

[GitHub] spark pull request: add ability to submit multiple jars for Driver

2014-07-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1113#issuecomment-50126195 @andrewor14 can you take a look at this? thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-07-25 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/1589 [SPARK-2687] [yarn]amClient should remove ContainerRequest in https://issues.apache.org/jira/browse/YARN-1902, after receving allocated containers,if amClient donot remove ContainerRequest,RM

[GitHub] spark pull request: [SPARK-2675] Increase EVENT_QUEUE_CAPACITY by ...

2014-07-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1579#issuecomment-50146398 i want to know how much memory that @shivaram said before. @concretevitamin can show the number?thanks. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-2298: Show stage attempt in UI

2014-07-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1384#issuecomment-50149066 @tsudukim yes,SPARK-2298 is that i want to. but i think a simple way is on this PR add a jobid column to stage table.it is very easy to achieve it. --- If your

[GitHub] spark pull request: [SPARK-2648] through shuffling blocksByAddress...

2014-07-27 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1549#issuecomment-50293186 good question. i understand more and think that this PR is unnecessary. thank you. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-2648] through shuffling blocksByAddress...

2014-07-27 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/1549 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-07-29 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1589#issuecomment-50475830 @witgo @andrewor14 please take a look at it. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2677] BasicBlockFetchIterator#next can ...

2014-07-30 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/1632#discussion_r15573389 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -117,31 +121,45 @@ object BlockFetcherIterator

[GitHub] spark pull request: add ability to submit multiple jars for Driver

2014-06-18 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/1113 add ability to submit multiple jars for Driver add ability to submit multiple jars for Driver You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: discarded exceeded completedDrivers

2014-06-18 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/1114 discarded exceeded completedDrivers When completedDrivers number exceeds the threshold, the first Max(spark.deploy.retainedDrivers, 1) will be discarded. You can merge this pull request

[GitHub] spark pull request: discarded exceeded completedDrivers

2014-07-15 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1114#issuecomment-48995170 @andrewor14 i have created a jira issue SPARK-2302. yes, it is for reducing Master's memory. Thank you. --- If your project is set up for it, you can reply

[GitHub] spark pull request: discarded exceeded completedDrivers

2014-07-16 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1114#issuecomment-49173609 @CodingCat thanks,i have created a jire issue https://issues.apache.org/jira/browse/SPARK-2524 --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2524] missing document about spark.depl...

2014-07-16 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/1443 [SPARK-2524] missing document about spark.deploy.retainedDrivers https://issues.apache.org/jira/browse/SPARK-2524 The configuration on spark.deploy.retainedDrivers is undocumented

[GitHub] spark pull request: [SPARK-2524] missing document about spark.depl...

2014-07-16 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1443#issuecomment-49249536 @pwendell thanks. i address your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2460] Optimize SparkContext.hadoopFile ...

2014-07-17 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/1385#discussion_r15055059 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -128,25 +123,13 @@ class HadoopRDD[K, V]( // Returns a JobConf

[GitHub] spark pull request: SPARK-2058: Overriding config from SPARK_HOME ...

2014-07-17 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/997#discussion_r15095873 --- Diff: bin/compute-classpath.sh --- @@ -30,6 +30,11 @@ FWDIR=$(cd `dirname $0`/..; pwd) # Build up classpath CLASSPATH=$SPARK_CLASSPATH

[GitHub] spark pull request: SPARK-2380 [WIP]: Support displaying accumulat...

2014-07-18 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/1309#discussion_r15113632 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -217,6 +223,7 @@ private[ui] class StagePage(parent: JobProgressTab

[GitHub] spark pull request: [SPARK-2522] set default broadcast factory to ...

2014-07-18 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1437#issuecomment-49439644 but when broadcoast's size 1G, TorrentBroadcast has a array size exceeds error.in Utils.serialize() will transfer object to Array[Byte]. when broadcoast object's

[GitHub] spark pull request: SPARK-2310. Support arbitrary Spark properties...

2014-07-20 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1253#issuecomment-49542251 how about -Dspark.app.name=blah? because in jvm or Hadoop, they use -D flag to represent conf properties. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-21 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/634#discussion_r15156184 --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientClusterScheduler.scala --- @@ -37,14 +37,4 @@ private[spark] class

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-21 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/634#discussion_r15156221 --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala --- @@ -30,6 +30,11 @@ private[spark] class

[GitHub] spark pull request: add ability to submit multiple jars for Driver

2014-09-04 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1113#issuecomment-54457835 @JoshRosen @andrewor14 I have update comment. has any question for it? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: Add a function that can build an EdgePartition...

2014-09-10 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/792#issuecomment-55122492 @ankurdave yes, i agree with you. like this: def toEdgePartition(sort: Boolean = true): EdgePartition[ED, VD] = { } --- If your project is set up

[GitHub] spark pull request: [SPARK-4195][Core]retry to fetch blocks's resu...

2014-11-02 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/3061 [SPARK-4195][Core]retry to fetch blocks's result when fetchfailed's reason is connection timeout when there are many executors in a application(example:1000),Connection timeout often occure

[GitHub] spark pull request: [SPARK-4249][GraphX]fix a problem of EdgeParti...

2014-11-06 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/3138 [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId to currSrcId You can merge this pull request

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-11-06 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1589#issuecomment-62085093 @tgravescs i have take a look at the latest version and make sure that problem still exist. because when amClient receive containers from YARN's RM, amClient need

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-11-10 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1589#issuecomment-62489097 yes, the scenario that your said is one of situation.other is: spark requests 2 containers (yarn request total = 2) yarn allocates 2 and give

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-11-12 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1589#issuecomment-62720066 I think RM will allocate more than one to spark's AM when a executor fails. Here is a scenario: 1. spark requests 3 containers (AM RM request total = 3) 2

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-11-13 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/3245 [SPARK-2687] [yarn]amClient should remove ContainerRequest in https://issues.apache.org/jira/browse/YARN-1902, after receving allocated containers,if amClient donot remove ContainerRequest,RM

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-11-13 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1589#issuecomment-62895644 yes,i create new PR:https://github.com/apache/spark/pull/3245 for the latest code. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-11-13 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/1589 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-11-18 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3245#discussion_r20504747 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala --- @@ -43,10 +44,20 @@ private[yarn] class

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-11-18 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3245#discussion_r20559836 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala --- @@ -43,10 +44,20 @@ private[yarn] class

[GitHub] spark pull request: [SPARK-4534][Core]JavaSparkContext create new ...

2014-11-21 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/3403 [SPARK-4534][Core]JavaSparkContext create new constructor to support preferredNodeLocalityData with YARN create new constructor to support preferredNodeLocalityData with YARN example

[GitHub] spark pull request: [Spark 2387][Core]remove stage barrier

2014-11-24 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/3430 [Spark 2387][Core]remove stage barrier based on https://github.com/apache/spark/pull/1328. when one task of parent stage is not finished, so other executors is idle. we can pre-start

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-11-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3409#issuecomment-64192799 i think spark.yarn.driver.extraJavaOptions or spark.yarn.executor.extraJavaOptions is better. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-4534][Core]JavaSparkContext create new ...

2014-11-25 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/3403 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-4534][Core]JavaSparkContext create new ...

2014-11-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3403#issuecomment-64429299 OK,i will close this。thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4516] Avoid allocating Netty PooledByte...

2014-11-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3465#issuecomment-64516973 thanks aaron. that 's very great. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-05 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12268618 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -523,6 +504,81 @@ private[spark] class Master

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-05 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12310911 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -523,6 +504,89 @@ private[spark] class Master

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-05 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12310983 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -523,6 +504,89 @@ private[spark] class Master

[GitHub] spark pull request: [WIP]SPARK-1706: Allow multiple executors per ...

2014-05-11 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12507640 --- Diff: core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala --- @@ -28,6 +28,7 @@ private[spark] class ApplicationDescription

[GitHub] spark pull request: [WIP]SPARK-1706: Allow multiple executors per ...

2014-05-11 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12509207 --- Diff: core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala --- @@ -28,6 +28,7 @@ private[spark] class ApplicationDescription

[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-12 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/731#discussion_r12564010 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -466,30 +466,14 @@ private[spark] class Master( * launched

[GitHub] spark pull request: [WIP]SPARK-1706: Allow multiple executors per ...

2014-05-15 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12472994 --- Diff: core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala --- @@ -28,6 +28,7 @@ private[spark] class ApplicationDescription

[GitHub] spark pull request: [WIP]SPARK-1706: Allow multiple executors per ...

2014-05-16 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/636#discussion_r12464105 --- Diff: core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala --- @@ -28,6 +28,7 @@ private[spark] class ApplicationDescription

[GitHub] spark pull request: [SPARK-1886] check executor id existence when ...

2014-05-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/827#issuecomment-44000844 great zhpengg in our environment it appears this problem and we fix it. so we should quickly merge into 1.0 release --- If your project is set up for it, you can

[GitHub] spark pull request: bugfix worker DriverStateChanged state should ...

2014-05-23 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/864 bugfix worker DriverStateChanged state should match DriverState.FAILED bugfix worker DriverStateChanged state should match DriverState.FAILED You can merge this pull request into a Git

[GitHub] spark pull request: bugfix worker DriverStateChanged state should ...

2014-05-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/864#issuecomment-44158350 can anyone verify this patch?thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2301] add ability to submit multiple ja...

2014-12-17 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/1113 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-2301] add ability to submit multiple ja...

2014-12-17 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/1113#issuecomment-67427901 yes, i think we can close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-4195][Core]retry to fetch blocks's resu...

2014-12-23 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3061#discussion_r22245434 --- Diff: core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala --- @@ -39,6 +41,10 @@ final class NioBlockTransferService(conf

[GitHub] spark pull request: [SPARK-4195][Core]retry to fetch blocks's resu...

2014-12-23 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3061#discussion_r22245488 --- Diff: core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala --- @@ -121,8 +132,34 @@ final class NioBlockTransferService

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-12-24 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/3245 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-2687] [yarn]amClient should remove Cont...

2014-12-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3245#issuecomment-68059290 ok, i will close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4195][Core]retry to fetch blocks's resu...

2014-12-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3061#issuecomment-68059363 OK. I will close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4195][Core]retry to fetch blocks's resu...

2014-12-24 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/3061 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...

2014-12-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3797#issuecomment-68103677 @XuTingjun yes, i agree with you. we should let parseArgs before using config amMemory and executorMemory. because parseArgs can change these value from args

[GitHub] spark pull request: [SPARK-4994][network]Cleanup removed executors...

2014-12-29 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/3828 [SPARK-4994][network]Cleanup removed executors' ShuffleInfo in yarn shuffle service when the application is completed, yarn's nodemanager can remove application's local-dirs.but all executors

[GitHub] spark pull request: [SPARK-5470][Core]use defaultClassLoader to lo...

2015-02-03 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4258#issuecomment-72780420 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-5593][Core]Replace BlockManagerListener...

2015-02-04 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4369 [SPARK-5593][Core]Replace BlockManagerListener with ExecutorListener in ExecutorAllocationListener More strictly, in ExecutorAllocationListener, we need to replace onBlockManagerAdded

[GitHub] spark pull request: [SPARK-5530] Add executor container to executo...

2015-02-02 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4309#issuecomment-72455127 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-02 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72578648 @andrewor14 thank you, About SPARK_HOME, we need to consider two places of compatible. The first is communication between python context and Scala context

[GitHub] spark pull request: [SPARK-5093] Set spark.network.timeout to 120s...

2015-02-02 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3903#discussion_r23982799 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -52,11 +52,7 @@ class BlockManagerMasterActor(val isLocal

[GitHub] spark pull request: [SPARK-5636] Ramp up faster in dynamic allocat...

2015-02-05 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4409#issuecomment-73188040 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5529][Core]Replace blockManager's timeo...

2015-02-04 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4367 [SPARK-5529][Core]Replace blockManager's timeoutChecking with executor's timeoutChecking the phenomenon is: blockManagerSlave is timeout and BlockManagerMasterActor will remove

[GitHub] spark pull request: [SPARK-5653][YARN] In ApplicationMaster rename...

2015-02-06 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4430 [SPARK-5653][YARN] In ApplicationMaster rename isDriver to isClusterMode in ApplicationMaster rename isDriver to isClusterMode,because in Client it uses isClusterMode,ApplicationMaster should

[GitHub] spark pull request: SPARK-4337. [YARN] Add ability to cancel pendi...

2015-02-06 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4141#discussion_r24241097 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -124,10 +123,12 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: [SPARK-4879] Use driver to coordinate Hadoop o...

2015-02-06 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r24240721 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -148,6 +148,20 @@ case object TaskKilled extends TaskFailedReason

[GitHub] spark pull request: [SPARK-4879] Use driver to coordinate Hadoop o...

2015-02-09 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r24315166 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,24 +106,61 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: SPARK-1714. Take advantage of AMRMClient APIs ...

2015-01-14 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3765#discussion_r22984790 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -153,498 +154,241 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: SPARK-1714. Take advantage of AMRMClient APIs ...

2015-01-14 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3765#discussion_r22984854 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -153,498 +154,241 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: [SPARK-4630][Core]Dynamically determine optima...

2015-01-16 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4070 [SPARK-4630][Core]Dynamically determine optimal number of partitions stages in application have different size of data. if user doesnot set numPartitions for any stages, spark will use same

[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-19 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4051#issuecomment-70606552 when we set initial number to min, there is a delay to request more executors. because in ExecutorAllocationManager's addExecutors numExecutorsToAdd starts from 1

[GitHub] spark pull request: [SPARK-4630][Core]Dynamically determine optima...

2015-01-19 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4070#issuecomment-70489709 @rxin yes, some of etl jobs that has groupby and join operators have been tried to use this feature.most of time that can determine number of partition very well

[GitHub] spark pull request: [SPARK-5266][Yarn]AM's numExecutorsFailed shou...

2015-01-15 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4061 [SPARK-5266][Yarn]AM's numExecutorsFailed should exclude number of killExecutor when driver request killExecutor, am will kill container and numExecutorsFailed will increment. when

[GitHub] spark pull request: [SPARK-5266][Yarn]AM's numExecutorsFailed shou...

2015-01-15 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/4061 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-5266][Yarn]AM's numExecutorsFailed shou...

2015-01-15 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4061#issuecomment-70089056 i find that when driver request killExecutor,numExecutorsFailed would not increment. so i will close this PR. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-22 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-71034833 yes, this PR is for batch mode by py file. so i think yarn-client mode is enough for interactive /bin/pyspark. but sometime batch python application need to run

[GitHub] spark pull request: [SPARK-4654][CORE] Clean up DAGScheduler getMi...

2015-01-22 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4134#discussion_r23381894 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -349,34 +349,7 @@ class DAGScheduler( } private

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-20 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-70770431 yes, we just specify .py file or primaryResource file via spark-submit, with this PR we can make pyspark run in yarn-cluster mode. example: spark-submit

  1   2   3   4   5   6   >