[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2018-10-21 Thread zhzhan
Github user zhzhan closed the pull request at: https://github.com/apache/spark/pull/15541 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20480: [Spark-23306] Fix the oom caused by contention

2018-02-01 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/20480 [Spark-23306] Fix the oom caused by contention ## What changes were proposed in this pull request? here is race condition in TaskMemoryManger, which may cause OOM. The memory

[GitHub] spark issue #17180: [SPARK-19839][Core]release longArray in BytesToBytesMap

2017-07-27 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/17180 retest it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17180: [SPARK-19839][Core]release longArray in BytesToBytesMap

2017-07-26 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/17180 Will fix the unit test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-26 Thread zhzhan
Github user zhzhan closed the pull request at: https://github.com/apache/spark/pull/18694 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-26 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/18694 Close the PR and will work on adding close interface for the iterator used in SparkSQL to remove extra overhead. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17180: [SPARK-19839][Core]release longArray in BytesToBytesMap

2017-07-24 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/17180 The test failure us caused by call method on the map after `destructiveIterator()` has been called. It is illegal by the definition. https://github.com/apache/spark/blob/master/core/src

[GitHub] spark issue #17180: [SPARK-19839][Core]release longArray in BytesToBytesMap

2017-07-24 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/17180 per review comments, release the longArray on destructive iterator creation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-21 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/18694 Currently the patch helps the scenario such as Join(A, Join(B,C)). It is critical for us because we have some internal development in which each stage may consists of tens of sort operators. We

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-21 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/18694 If it is assumed that the pipeline is as simple as one stage only has one operator need to spill, you are right. But if the pipeline is more complex, for example multiple operator needs to spill

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/18694 cleanup hook is used after task is done. The diff solve the leak for SortMergeJoin only and does not apply to the limit case. Limit is another special case and need to be taken care of separately

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128683903 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/18694 The memory leak happens on following scenario. For example, in inner join, the left side is exhausted, we will stop advance the right side. Because the right side is not reach the end, the memory

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128679491 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/18694 [SPARK-21492][SQL] Memory leak in SortMergeJoin ## What changes were proposed in this pull request? Fix the memory in SortMergeJoin ## How was this patch tested? Relies on existing

[GitHub] spark pull request #17180: [SPARK-19839][Core]release longArray in BytesToBy...

2017-03-06 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/17180 [SPARK-19839][Core]release longArray in BytesToBytesMap ## What changes were proposed in this pull request? When BytesToBytesMap spills, its longArray should be released. Otherwise, it may

[GitHub] spark issue #17155: [SPARK-19815][SQL] Not orderable should be applied to ri...

2017-03-03 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/17155 @gatorsmile Thanks for reviewing this. I am thinking the logic again. On the surface, the logic may be correct. Since in the join, the left and right key should be the same type. Will close the PR

[GitHub] spark pull request #17155: [SPARK-19815][SQL] Not orderable should be applie...

2017-03-03 Thread zhzhan
Github user zhzhan closed the pull request at: https://github.com/apache/spark/pull/17155 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #17155: [SPARK-19815][SQL] Not order able should be appli...

2017-03-03 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/17155 [SPARK-19815][SQL] Not order able should be applied to right key instead of left key ## What changes were proposed in this pull request? Change the orderable condition. ## How

[GitHub] spark issue #16909: [SPARK-13450] Introduce ExternalAppendOnlyUnsafeRowArray...

2017-02-14 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/16909 @hvanhovell @davies Correct me if I am wrong. My understanding is that following code will go though all matching rows on the right side, and put them into the BufferedRowIterator. If there is OOM

[GitHub] spark issue #16909: [SPARK-13450] Introduce ExternalAppendOnlyUnsafeRowArray...

2017-02-13 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/16909 @tejasapatil Do you want to fix the BufferedRowIterator for WholeStageCodegenExec as well? As for inner join, the LinkedList currentRows would cause the same issue as it buffer the rows from inner

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-08 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r91570259 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +489,26 @@ class HiveUDFSuite extends QueryTest

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-08 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r91569919 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +489,26 @@ class HiveUDFSuite extends QueryTest

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-06 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r91142141 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-05 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r91026585 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-05 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r91026433 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-03 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r90763121 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +488,29 @@ class HiveUDFSuite extends QueryTest

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-12-03 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/16068 @gatorsmile we cannot use deterministic = true/false, as there are existing udf with deterministic as true, but stateful as true as well. --- If your project is set up for it, you can reply

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-12-03 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/16068 My understanding is that the non-deterministic udf does not need to be stageful, but a stateful udf has to be non-deterministic. Here is the comments in hive regarding this property

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-12-01 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/16068 @hvanhovell Would you like take a look and let me know if you have any concern? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-11-30 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/16068 @hvanhovell Thanks for looking at this. We have a big number of UDFs that have this issue. For example, the UDF gives different result with different partition/sort, but the UDF is pushdown before

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-11-30 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/16068 retest it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16068: stateful udf should be nondeterministic

2016-11-29 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/16068 stateful udf should be nondeterministic ## What changes were proposed in this pull request? Make stateful udf as nondeterministic ## How was this patch tested? Mainly

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-11-01 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15541 @rxin Thanks for the feedback regarding the TaskAssigner API. The current API is designed based on the current logic of TaskSchedulerImp, where the scheduler takes many rounds to assign the tasks

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-11-01 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r85985739 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-25 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15541 @rxin Would you like to take a look and let you know if you have any concern? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-23 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84621076 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -250,24 +251,24 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-23 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84619879 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -250,24 +251,24 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-23 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84619023 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-21 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84424034 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-20 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15541 @gatorsmile I didn't see your new comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-20 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15541 @rxin Can you please take a look, and let me know if you have any concern? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84158910 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84129486 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala --- @@ -109,6 +108,85 @@ class TaskSchedulerImplSuite extends

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84119714 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84002685 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84002480 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala --- @@ -109,6 +108,85 @@ class TaskSchedulerImplSuite extends

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84002353 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84002236 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-18 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r83999756 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-18 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r83998058 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-18 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r83997070 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-18 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15541 @rxin @gatorsmile Can you please take a look, and kindly provide your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-18 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/15541 [SPARK-17637][Scheduler]Packed scheduling for Spark tasks across executors ## What changes were proposed in this pull request? Restructure the code and implement two new task assigner

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-18 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 @wangmiao1981 Thanks for reviewing this. I will open another PR solving these comments soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-16 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 @rxin Thanks a lot for the detail review. I will update the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-15 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 @mridulm Thanks for reviewing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-09 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 @mridulm You are right. This patch is mainly for the job that has multiple stages, which is very common in production pipeline. As you mentioned, if there is shuffle involved

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-07 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 @mridulm Thanks for the comments. Your concern regarding the locality is right. The patch does not change this behavior, which takes priority of locality preference. But if multiple executors

[GitHub] spark pull request #15218: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-06 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15218#discussion_r82321008 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15218: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-06 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/15218#discussion_r82290564 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-04 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 @mridulm Thanks for review this. Will wait for a while in case there are more comments before solving it. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-09-23 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 @gatorsmile Thanks. #65832 is the latest one which does not have the same failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-09-23 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-09-23 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15218 Failed in DirectKafkaStreamSuite. It should has nothing to do with the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15218: [Spark-17637][Scheduler]Packed scheduling for Spa...

2016-09-23 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/15218 [Spark-17637][Scheduler]Packed scheduling for Spark tasks across executors ## What changes were proposed in this pull request? Restructure the code and implement two new task assigner

[GitHub] spark issue #15080: [SPARK-17526][Web UI]: Display the executor log links wi...

2016-09-14 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/15080 @srowen Thanks for reviewing this. Any suggestion to improve it are welcomed. It does bother us a lot without being able to locate the debug log quickly in production. --- If your project is set

[GitHub] spark pull request #15080: SPARK-17526: add log links in job failures

2016-09-13 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/15080 SPARK-17526: add log links in job failures ## What changes were proposed in this pull request? Add the executor log links with the job failure message on Spark UI and Console ## How

[GitHub] spark pull request: [SPARK-15441][SQL] support null object in oute...

2016-05-30 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/13322#issuecomment-222432192 My understanding is that this new added hidden column is mainly for serdes object to/from row. How would you leverage it to solve the the out join case where the null

[GitHub] spark pull request: SPARK-12417. [SQL] Orc bloom filter options ar...

2015-12-18 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/10375#issuecomment-165844369 Any test cases to make sure it works as expected? Do you mind changing the orc ppd enabled as default or using another JIRA. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-11562][SQL] Provide user an option to i...

2015-11-08 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/9553#discussion_r44243045 --- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala --- @@ -78,16 +79,21 @@ object Main extends Logging

[GitHub] spark pull request: [SPARK-11562][SQL] Provide user an option to i...

2015-11-08 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/9553#issuecomment-154943558 Need document update for this new configuration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-11562][SQL] Provide user an option to i...

2015-11-08 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/9553#discussion_r44242983 --- Diff: repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala --- @@ -132,6 +132,7 @@ class SparkILoop( @DeveloperApi var

[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

2015-10-27 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/9232#discussion_r43204781 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -142,6 +145,97 @@ class YarnSparkHadoopUtil extends

[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

2015-09-18 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/8799#issuecomment-141355072 LGTM Thanks for fixing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-10623][SQL]: fix the predicate pushdown...

2015-09-17 Thread zhzhan
Github user zhzhan closed the pull request at: https://github.com/apache/spark/pull/8783 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-10623][SQL]: fix the predicate pushdown...

2015-09-17 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/8783#issuecomment-141215436 @liancheng Thanks for review. Since https://github.com/apache/spark/pull/8799 is opened, which also fix another issue. I will close this one. --- If your project

[GitHub] spark pull request: [SPARK-10623][SQL]: fix the predicate pushdown...

2015-09-16 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/8783 [SPARK-10623][SQL]: fix the predicate pushdown construction The predicate pushdown is not working because the construction is wrong. Fix it with startAnd/end You can merge this pull request

[GitHub] spark pull request: [SPARK-10623][SQL]: fix the predicate pushdown...

2015-09-16 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/8783#issuecomment-140941698 @liancheng @marmbrus Can you help to review it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-10304][SQL]: throw error when the table...

2015-09-01 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/8547#issuecomment-136610247 Adding an PartitionValues.empty does not cover all problems. Will close this PR, and investigate other approaches. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-10304][SQL]: throw error when the table...

2015-09-01 Thread zhzhan
Github user zhzhan closed the pull request at: https://github.com/apache/spark/pull/8547 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-10304][SQL]: throw error when the table...

2015-09-01 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/8547#discussion_r38391122 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -436,7 +436,8 @@ abstract class HadoopFsRelation private[sql

[GitHub] spark pull request: [SPARK-10304][SQL]: throw error when the table...

2015-08-31 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/8547 [SPARK-10304][SQL]: throw error when the table directory is invalid Throw error if the directory of a table is invalid, validated by either all files in the directory are partitioned, or none

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-08-27 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135323382 @viirya Take a quick second look at the issue. As @chenghao-intel mentioned, since normalizing the name(to lower case) is the default behavior. Should we fix

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-08-27 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135324209 Also we need to change private lazy val nameToField: Map[String, StructField] = fields.map(f = f.name.toLowerCase - f).toMap --- If your project is set up

[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...

2015-08-27 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135332239 @liancheng have more insights on this part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9613] [CORE] [WIP] Ban use of JavaConve...

2015-08-17 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/8033#issuecomment-131959546 @srowen It seems that the mapping got messed up, which I don't have clue yet and didn't find any obvious reason why the patch can break the test. I will dig more

[GitHub] spark pull request: [SPARK-9613] [CORE] [WIP] Ban use of JavaConve...

2015-08-17 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/8033#issuecomment-131960610 @srowen Probably you can revert back the change in sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-08-03 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-127465813 LGTM. Will let @liancheng take a final look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-07-23 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/7520#discussion_r35396110 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -86,19 +86,10 @@ private[orc] class OrcOutputWriter

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-07-23 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/7520#discussion_r35395830 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -120,15 +111,11 @@ private[orc] class OrcOutputWriter

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-07-23 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-124335510 LGTM with the comments answered or resolved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-07-23 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/7520#discussion_r35337326 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -85,18 +85,11 @@ private[orc] class OrcOutputWriter

[GitHub] spark pull request: [SPARK-8501] [SQL] Avoids reading schema from ...

2015-07-02 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/7200#issuecomment-118190051 some minor comments. Overall, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-8501] [SQL] Avoids reading schema from ...

2015-07-02 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/7200#issuecomment-118187344 @liancheng Because in spark, we will not create the orc file if the record is empty. It is only happens with the ORC file created by hive, right? --- If your project

[GitHub] spark pull request: [SPARK-8501] [SQL] Avoids reading schema from ...

2015-07-02 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/7200#discussion_r33831074 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala --- @@ -24,30 +24,58 @@ import org.apache.hadoop.hive.serde2

[GitHub] spark pull request: [Spark-5111][SQL]HiveContext and Thriftserver ...

2015-06-17 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/4064#issuecomment-112875139 @WangTaoTheTonic The problem happens with spark-1.3 and hadoop-2.6 in kerberos cluster. With hive-0.14 support, I suppose the problem may be gone, but I didn't verify

[GitHub] spark pull request: [Spark-5111][SQL]HiveContext and Thriftserver ...

2015-06-17 Thread zhzhan
Github user zhzhan closed the pull request at: https://github.com/apache/spark/pull/4064 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...

2015-06-16 Thread zhzhan
Github user zhzhan closed the pull request at: https://github.com/apache/spark/pull/5637 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...

2015-06-16 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/5637#issuecomment-112615055 Close this PR, as it may be outdated with latest spark upstream and not working. --- If your project is set up for it, you can reply to this email and have your reply

  1   2   3   4   >