[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-05-10 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-100652864 @tgravescs sorry for my late reply. i think https://github.com/apache/spark/pull/6022/ is worked for SPARK-7485. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-05-02 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-98372096 @tgravescs yes, i think there are some other unit test failed. jenkins, please test this please. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-29 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/5580#discussion_r29369790 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -341,6 +342,17 @@ private[spark] class Client( env

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-29 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/5580#discussion_r29370287 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -328,6 +328,46 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-29 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-97547542 @Sephiroth-Lin @tgravescs i have updated it and zip pyspark archives in mvn's pom.xml and sbt's build.scala. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-29 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-97553254 @vanzin below code is very important. pyArchives = pyArchives.split(,).map { localPath= val localURI = Utils.resolveURI(localPath

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-27 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/5580#discussion_r29162587 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -328,6 +328,42 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-27 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/5580#discussion_r29169827 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -328,6 +328,42 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-27 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-96752326 @tgravescs yes, i agree with your comments and have update it. can you review it again? thanks. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-26 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-96369904 @andrewor14 for second question,i add two things for it.one is i add zip pyspark archives to pyspark/lib when we build spark jar. other is in submit

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-26 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-96369994 @tgravescs i think this PR is useful for you. you can try it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-5687][Core]TaskResultGetter needs to ca...

2015-04-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4474#issuecomment-95907922 @pwendell i think we cannot kill JVM directly when this occurs. when it is hive server that one driver for many jobs, if we kill JVM, other jobs on this driver

[GitHub] spark pull request: [SPARK-5687][Core]TaskResultGetter needs to ca...

2015-04-23 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4474#issuecomment-95773327 @pwendell @andrewor14 i think i should reopen this PR. because i got this error yesterday. when i used collect() from many executors. then task-result-getter thread

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-20 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/5580#discussion_r28743589 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -328,6 +328,14 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-20 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-94609007 @andrewor14 First question is why not just put it on --py-files? because on yarn-client mode, we can not use --py-files. so if we put it on --py-files for yarn

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-19 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/5580 [SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH Based on https://github.com/apache/spark/pull/5478 that provide a PYSPARK_ARCHIVES_PATH env. from this PR, we just should export

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-17 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5478#issuecomment-94111781 @Sephiroth-Lin i think later i will submit my PR based on this PR and then please help me review it. thanks. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-16 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5478#issuecomment-93762922 yes, i think in SparkSubmit we can automatically add PYSPARK_ARCHIVES_PATH to dist files. and then in Client and ExecutorRunnable can set PYTHONPATH according

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-16 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5478#issuecomment-93873704 @sryza we can export PYSPARK_ARCHIVES_PATH=local://xx/pyspark.zip;local://xx/py4j.zip in spark-env.sh and we also can export PYSPARK_ARCHIVES_PATH=hdfs://xx

[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

2015-04-10 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5461#issuecomment-91741359 LGTM @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2926][Shuffle]Add MR style sort-merge s...

2015-04-10 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3438#discussion_r28191449 --- Diff: core/src/main/scala/org/apache/spark/util/collection/TieredDiskMerger.scala --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2926][Shuffle]Add MR style sort-merge s...

2015-04-10 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3438#discussion_r28191464 --- Diff: core/src/main/scala/org/apache/spark/util/collection/TieredDiskMerger.scala --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-6556][Core] Fix wrong parsing logic of ...

2015-03-26 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5209#issuecomment-86504920 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5763][Core]add Sort-Merge Join to resol...

2015-03-24 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/5168 [SPARK-5763][Core]add Sort-Merge Join to resolve skewed data add sort-merge join to resolve skewed data. i provide three interface to achieve join operator using SortMergeJoinRDD

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-03-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-85542318 @tgravescs thanks, i know what you means. i will take a look at it. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-03-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-85531984 @tgravescs does this mean that spark has to be installed on all the nodes? That shouldn't be needed on YARN. yes, now we need to put spark_home dir to all

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-03-24 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4823#discussion_r27089597 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -42,6 +42,18 @@ private[ui] class AllJobsPage(parent: JobsTab) extends

[GitHub] spark pull request: [SPARK-5763][Core]Add Sort-Merge Join to resol...

2015-03-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5159#issuecomment-85416065 i will close this because my branch is spark-1.2 and later i will use master to test and then address. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-5763][Core]Add Sort-Merge Join to resol...

2015-03-24 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/5159 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-03-24 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4823#discussion_r27006782 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -42,6 +42,18 @@ private[ui] class AllJobsPage(parent: JobsTab) extends

[GitHub] spark pull request: [SPARK-5763][Core]Add Sort-Merge Join to resol...

2015-03-24 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/5159 [SPARK-5763][Core]Add Sort-Merge Join to resolve skewed data add sort-merge join to resolve skewed data @rxin @sryza You can merge this pull request into a Git repository by running

[GitHub] spark pull request: [SPARK-6103][Graphx]remove unused class to imp...

2015-03-01 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4846 [SPARK-6103][Graphx]remove unused class to import in EdgeRDDImpl Class TaskContext is unused in EdgeRDDImpl, so we need to remove it from import list. You can merge this pull request

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-02-28 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4823#issuecomment-76584418 @srowen i have update for your comments. can you take a look again. thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-02-27 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4823 [SPARK-4411][UI]Add kill link for jobs in the UI We should have a kill link for each job, similar to what we have for each stage, so it's easier for users to kill jobs in the UI

[GitHub] spark pull request: [SPARK-6058][Yarn] Log the user class exceptio...

2015-02-27 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4813#issuecomment-76362065 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-24 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4363#discussion_r25312597 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -17,33 +17,84 @@ package org.apache.spark -import

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-75903702 i do not think that a global default ratio is right. because in a job the size of each stage is different and they are not Increasing or decreasing. if we define

[GitHub] spark pull request: [SPARK-2213][SQL] Sort Merge Join

2015-02-15 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3173#issuecomment-74461781 @justinuang i think you are interesting in SPARK-5763. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-5759][Yarn]ExecutorRunnable should catc...

2015-02-12 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4554#issuecomment-74034822 @sryza thanks. I think i can use SparkException to warp Exception. Could you review this again? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-5759][Yarn]ExecutorRunnable should catc...

2015-02-11 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4554 [SPARK-5759][Yarn]ExecutorRunnable should catch YarnException while NMClient start contain... some time since some of reasons, it lead to some exception while NMClient start

[GitHub] spark pull request: SPARK-5613: Catch the ApplicationNotFoundExcep...

2015-02-11 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4392#issuecomment-74030105 i think for yarn-cluster mode, we also should catch the ApplicationNotFoundException.@pwendell @andrewor14 --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-5687][Core]TaskResultGetter needs to ca...

2015-02-10 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/4474 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-02-10 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r24406500 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -224,59 +240,90 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: [SPARK-4879] Use driver to coordinate Hadoop o...

2015-02-09 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r24315166 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,24 +106,61 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-5687][Core]TaskResultGetter need to cat...

2015-02-09 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4474 [SPARK-5687][Core]TaskResultGetter need to catch OutOfMemoryError. because in enqueueSuccessfulTask there is another thread to fetch result, if result is large,it maybe throw a OutOfMemoryError

[GitHub] spark pull request: [SPARK-4665] [SPARK-4666] Improve YarnAllocato...

2015-02-09 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3525#issuecomment-73535044 before i think that both of two configs can be existed. from @tgravescs i think overhead is more necessary than OverheadFraction , because at some time it has very

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-09 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-73541324 @tgravescs your thought is right. but there is just different at Yarn internal's Client. it is same way on spark-submit with Yarn client and Yarn cluster.so i think

[GitHub] spark pull request: [SPARK-5529][CORE]Add expireDeadHosts in Heart...

2015-02-09 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4363#issuecomment-73626671 @andrewor14 @rxin yes, i agree with you. for other mode, later we need to implement killing executor. so this PR is unify failure detection between blockmanager

[GitHub] spark pull request: [SPARK-5529][Core]Replace blockManager's timeo...

2015-02-09 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4367#issuecomment-73625959 yes ,i will close this PR. thanks all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5529][Core]Replace blockManager's timeo...

2015-02-09 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/4367 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-02-09 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r24313087 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -413,6 +418,7 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: [SPARK-5687][Core]TaskResultGetter needs to ca...

2015-02-09 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4474#issuecomment-73627335 @andrewor14 @pwendell yes, Now we have conf.get(spark.driver.maxResultSize, 1g) to control the memory of driver. but once OOM is happened at TaskResultGetter

[GitHub] spark pull request: [SPARK-4665] [SPARK-4666] Improve YarnAllocato...

2015-02-09 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3525#issuecomment-73520398 i think memoryOverheadFraction is good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5653][YARN] In ApplicationMaster rename...

2015-02-06 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4430 [SPARK-5653][YARN] In ApplicationMaster rename isDriver to isClusterMode in ApplicationMaster rename isDriver to isClusterMode,because in Client it uses isClusterMode,ApplicationMaster should

[GitHub] spark pull request: SPARK-4337. [YARN] Add ability to cancel pendi...

2015-02-06 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4141#discussion_r24241097 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -124,10 +123,12 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: [SPARK-4879] Use driver to coordinate Hadoop o...

2015-02-06 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r24240721 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -148,6 +148,20 @@ case object TaskKilled extends TaskFailedReason

[GitHub] spark pull request: [SPARK-5636] Ramp up faster in dynamic allocat...

2015-02-05 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4409#issuecomment-73188040 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5593][Core]Replace BlockManagerListener...

2015-02-04 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4369 [SPARK-5593][Core]Replace BlockManagerListener with ExecutorListener in ExecutorAllocationListener More strictly, in ExecutorAllocationListener, we need to replace onBlockManagerAdded

[GitHub] spark pull request: [SPARK-5529][Core]Replace blockManager's timeo...

2015-02-04 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4367 [SPARK-5529][Core]Replace blockManager's timeoutChecking with executor's timeoutChecking the phenomenon is: blockManagerSlave is timeout and BlockManagerMasterActor will remove

[GitHub] spark pull request: [SPARK-5470][Core]use defaultClassLoader to lo...

2015-02-03 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4258#issuecomment-72780420 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-5530] Add executor container to executo...

2015-02-02 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4309#issuecomment-72455127 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-02 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72578648 @andrewor14 thank you, About SPARK_HOME, we need to consider two places of compatible. The first is communication between python context and Scala context

[GitHub] spark pull request: [SPARK-5093] Set spark.network.timeout to 120s...

2015-02-02 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3903#discussion_r23982799 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -52,11 +52,7 @@ class BlockManagerMasterActor(val isLocal

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23904474 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -134,12 +136,29 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23904929 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -430,6 +430,10 @@ private[spark] class ApplicationMaster(args

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23905022 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -134,12 +136,29 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23904612 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -98,6 +121,9 @@ class YarnClusterSuite extends FunSuite

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72401201 for python application, if SPARK_HOME of submission node is different from the nodeManager, so it can not work in my test. example:submission node's version is 1.2

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23904494 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +292,20 @@ object SparkSubmit { // In yarn-cluster mode

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23904843 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -134,12 +136,29 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...

2015-01-31 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4292#discussion_r2348 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -820,7 +822,10 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-31 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23888903 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -808,6 +810,7 @@ class DAGScheduler( // will be posted, which

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-31 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72323941 @JoshRosen can you help me? now i add a unit test for python applicaiton on yarn cluster mode. but now there is a failed. i think its reason is environment

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-31 Thread lianhuiwang
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/3976 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-31 Thread lianhuiwang
GitHub user lianhuiwang reopened a pull request: https://github.com/apache/spark/pull/3976 [SPARK-5173]support python application running on yarn cluster mode now when we run python application on yarn cluster mode through spark-submit, spark-submit does not support python

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-31 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23888565 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -430,6 +430,10 @@ private[spark] class ApplicationMaster(args

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23884756 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +277,22 @@ object SparkSubmit { // In yarn-cluster mode

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23885383 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -165,6 +168,13 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-5470][Core]use defaultClassLoader to lo...

2015-01-30 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4258#issuecomment-72298982 @sryza thank you. I have updated for your comment. Can you review again? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23885537 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -172,7 +172,8 @@ private[spark] class SparkSubmitArguments(args

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72302620 @sryza @andrewor14 thank for your reviews. I have updated with your comments. Later i will add YarnClusterSuite for a test. --- If your project is set up

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-29 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23826046 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -327,8 +326,14 @@ object SparkSubmit { printStream.println(\n

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-01-28 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r23682980 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -226,50 +249,32 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-01-28 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r23682749 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -438,6 +444,7 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-01-28 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r23682774 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -470,6 +477,7 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-01-28 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r23683544 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -226,50 +249,32 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: [SPARK-5470][Core]use defaultClassLoader to lo...

2015-01-28 Thread lianhuiwang
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4258 [SPARK-5470][Core]use defaultClassLoader to load classes of classesToRegister in KryoSeria... Now KryoSerializer load classes of classesToRegister at the time of its initialization. when we

[GitHub] spark pull request: [SPARK-4955]With executor dynamic scaling enab...

2015-01-28 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3962#issuecomment-71953371 @andrewor14 thanks for you help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-4955]With executor dynamic scaling enab...

2015-01-28 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3962#discussion_r23740893 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -231,6 +231,25 @@ private[spark] class ApplicationMaster(args

[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-27 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4207#issuecomment-71633632 firstly,we need to rename title of PR to [SPARK-5324][SQL] Implement Describe Table for SQLContext. i find that code of this PR is not the latest code. i donot

[GitHub] spark pull request: [SPARK-4955]With executor dynamic scaling enab...

2015-01-26 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3962#issuecomment-71574630 @andrewor14 if we don't finish with SUCCEEDED on driver disassociation, AM should finish with non-zero. example: if driver's main class throw some exception and exit

[GitHub] spark pull request: [SPARK-4955]With executor dynamic scaling enab...

2015-01-26 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3962#discussion_r23581653 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -231,6 +231,25 @@ private[spark] class ApplicationMaster(args

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

2015-01-26 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4055#issuecomment-71434453 @JoshRosen i think that's ok. because change of code is very small and there is no influence for current logic. --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4051#issuecomment-71362007 @sryza @andrewor14 i find that setting minExecutors to initialExecutors is best for the following situation: when DAGScheduler submits missing tasks of first stage

[GitHub] spark pull request: [SPARK-4955]With executor dynamic scaling enab...

2015-01-25 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3962#discussion_r23511995 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -94,12 +91,14 @@ private[spark] abstract class

[GitHub] spark pull request: [SPARK-4955]With executor dynamic scaling enab...

2015-01-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3962#issuecomment-71411565 in cluster mode, AMActor donot need to subscribe to disassociated event. because sometime driver has some errors, Now AMActor donot understand what happened

[GitHub] spark pull request: [SPARK-4955]With executor dynamic scaling enab...

2015-01-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3962#issuecomment-71422216 @andrewor14 I have looked at it in depth. YarnSchedulerActor can work very well in both yarn cluster and yarn client mode and i have tested in these two mode. Now we

[GitHub] spark pull request: [SPARK-4955]With executor dynamic scaling enab...

2015-01-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3962#issuecomment-71404828 @andrewor14 have you some feedback about this PR? thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-25 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4051#issuecomment-71404736 @andrewor14 i think replacing maxExecutors with --num-executors is more reasonable, because when dynamic allocation is not enable, --num-executors

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-01-24 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r23502021 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -199,14 +199,31 @@ private[spark] class ExecutorAllocationManager

<    1   2   3   4   5   6   >