[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r111546462 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -168,6 +169,8 @@ private[spark] class TaskSetManager

[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 @squito Thank you so much for reviewing thus far and sorry for the complexity I bring in. I tried to simplify the code according to your comment and please take another look when tests

[GitHub] spark issue #17603: [SPARK-20288] Avoid generating the MapStatus by stageId ...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17603 @squito Could you help comment on this ? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17634: [SPARK-20333] HashPartitioner should be compatible with ...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17634 @squito @srowen Could you help comment on this :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 I think the failed unit test can be fixed in https://github.com/apache/spark/pull/17634 and https://github.com/apache/spark/pull/17603 --- If your project is set up for it, you can reply to

[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-04-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r111734780 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -133,36 +135,53 @@ private[spark] class HighlyCompressedMapStatus private

[GitHub] spark pull request #17744: [SPARK-20426] Lazy initialization of FileSegmentM...

2017-04-24 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17744 [SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shuffle service. ## What changes were proposed in this pull request? When application contains large amount of shuffle

[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-04-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17744 Spark jobs are running on yarn cluster in my warehouse. We enabled the external shuffle service(--conf spark.shuffle.service.enabled=true). Recently NodeManager runs OOM now and then. Dumping

[GitHub] spark pull request #17744: [SPARK-20426] Lazy initialization of FileSegmentM...

2017-04-25 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17744#discussion_r113356306 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -93,14 +92,25 @@ protected void

[GitHub] spark issue #17634: [SPARK-20333] HashPartitioner should be compatible with ...

2017-04-26 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17634 @kayousterhout @mridulm Does this pr make sense? Could you please take a look this when you have time :) --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114503489 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -100,7 +114,14 @@ public void

[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114503557 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +147,38 @@ private void

[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114503627 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -133,36 +135,53 @@ private[spark] class HighlyCompressedMapStatus private

[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114504511 --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala --- @@ -42,6 +46,12 @@ private[spark] class BlockStoreShuffleReader

[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17744 @tgravescs Thanks a lot for merging. I proposed to resolve this by "Lazy initialization of FileSegmentManagedBuffer" and simplify the change. But after checking the code, could

[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17744 Thanks again for help review this pr. Currently I'm not seeing memory issue on my nodemanagers. I'd report to community if there's new finding :) --- If your project is set up

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114696480 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -100,7 +114,14 @@ public void

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114943530 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -100,7 +114,14 @@ public void

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114959211 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +151,39 @@ private void

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114960041 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114960285 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +206,18 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114961196 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +181,41 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114965477 --- Diff: core/src/test/scala/org/apache/spark/shuffle/BlockStoreShuffleReaderSuite.scala --- @@ -126,11 +131,21 @@ class BlockStoreShuffleReaderSuite

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114965327 --- Diff: core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala --- @@ -128,4 +130,23 @@ class MapStatusSuite extends SparkFunSuite

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114965950 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +424,74 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114965988 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +424,74 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114967741 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Really really thankful for reviewing this pr:). I've refined according to your comments. Please take another look at this when you have time. --- If your project is set up f

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115018506 --- Diff: core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala --- @@ -128,4 +130,23 @@ class MapStatusSuite extends SparkFunSuite

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115021013 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128497296 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java --- @@ -145,7 +172,12 @@ private void

[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128498015 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java --- @@ -257,4 +257,7 @@ public Properties cryptoConf

[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128519475 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java --- @@ -145,7 +172,12 @@ private void

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 @cloud-fan I understand your concern. A `TransportRequestHandler` is for a channel/connection. We want to track the sending chunks of all connections. So I guess we must have a manager for

[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-21 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128794455 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/StreamManager.java --- @@ -83,4 +83,16 @@ public void connectionTerminated

[GitHub] spark pull request #18713: [SPARK-21509][SQL] Add a config to enable adaptiv...

2017-07-22 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18713 [SPARK-21509][SQL] Add a config to enable adaptive query execution only for the last que… ## What changes were proposed in this pull request? Feature of adaptive query execution is a good

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18713: [SPARK-21509][SQL] Add a config to enable adaptive query...

2017-07-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18713 cc @cloud-fan @jiangxb1987 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-25 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 Thanks for merging ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-25 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 @tgravescs Thanks for help. > I think we should expand the description of the config to say what happens when the limit is hit. Since its not using real flow control a user might

[GitHub] spark pull request #18735: [SPARK-21530] Update description of spark.shuffle...

2017-07-25 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18735 [SPARK-21530] Update description of spark.shuffle.maxChunksBeingTransferred. ## What changes were proposed in this pull request? Update the description of

[GitHub] spark issue #18735: [SPARK-21530] Update description of spark.shuffle.maxChu...

2017-07-25 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18735 cc @tgravescs @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18735: [SPARK-21530] Update description of spark.shuffle...

2017-07-26 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18735#discussion_r129608114 --- Diff: docs/configuration.md --- @@ -636,6 +636,8 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark issue #18735: [SPARK-21530] Update description of spark.shuffle.maxChu...

2017-07-26 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18735 @tgravescs Thanks, I should be more careful :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18713: [SPARK-21509][SQL] Add a config to enable adaptive query...

2017-07-27 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18713 Ok, I will close this for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18713: [SPARK-21509][SQL] Add a config to enable adaptiv...

2017-07-27 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/18713 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #18866: [SPARK-21649][SQL] Support writing data into hive...

2017-08-06 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18866 [SPARK-21649][SQL] Support writing data into hive bucket table. ## What changes were proposed in this pull request? Support writing hive bucket table. Spark internally uses Murmur3Hash

[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-06 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18866 I added the unit test referring (https://github.com/apache/hive/blob/branch-1/ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java#L393). Hive will sort bucket files by

[GitHub] spark pull request #18866: [SPARK-21649][SQL] Support writing data into hive...

2017-08-07 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18866#discussion_r131607680 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -262,7 +262,12 @@ case class

[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18866 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18866: [SPARK-21649][SQL] Support writing data into hive...

2017-08-07 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18866#discussion_r131682897 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -534,4 +534,29 @@ class InsertIntoHiveTableSuite

[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18866 @viirya Please take another look when you have time. I've already updated :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18866 cc @cloud-fan Would you mind give some comments? I can keep working on this :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #17533: [SPARK-20219] Schedule tasks based on size of input from...

2017-06-02 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 @HyukjinKwon Sorry, I will close this for now and make another pr if there's progress. --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark pull request #17533: [SPARK-20219] Schedule tasks based on size of inp...

2017-06-02 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/17533 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #18204: [SPARK-20985] sc.stop should be encapsulated in f...

2017-06-05 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18204 [SPARK-20985] sc.stop should be encapsulated in finally ## What changes were proposed in this pull request? Stop `SparkContext` in `finally`, thus other tests won't complain that th

[GitHub] spark issue #18204: [SPARK-20985] Stop SparkContext using LocalSparkContext....

2017-06-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18204 @srowen Thanks for approving ! :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18211: [WIP][SPARK-20994] Alleviate memory pressure in S...

2017-06-05 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18211 [WIP][SPARK-20994] Alleviate memory pressure in StreamManager ## What changes were proposed in this pull request? In current code, chunks are fetched from shuffle service in two steps

[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...

2017-06-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18211 In my cluster, we are suffering from OOM of shuffle-service. We found that a lot of executors are fetching blocks from a single shuffle-service. Analyzing the memory, we found that the

[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...

2017-06-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18211 In this pr: 1. Instead of `chunkIndex`, fetch chunk by `String chunkId`. Server doesn't cache the blocks list. 2. In `OpenBlocks`, only metadata(e.g. appId, executorId) of the stre

[GitHub] spark issue #18204: [SPARK-20985] Stop SparkContext using LocalSparkContext....

2017-06-06 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18204 Thanks for merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18211: [WIP][SPARK-20994] Alleviate memory pressure in S...

2017-06-06 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18211#discussion_r120388659 --- Diff: common/network-common/src/test/java/org/apache/spark/network/server/OneForOneStreamManagerSuite.java --- @@ -1,50 +0,0

[GitHub] spark issue #18211: [SPARK-20994] Alleviate memory pressure in StreamManager

2017-06-06 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18211 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...

2017-06-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18211 @vanzin Thanks a lot for comment. I will close this pr and think if there is other solution. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #18211: [WIP][SPARK-20994] Alleviate memory pressure in S...

2017-06-07 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/18211 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 n my cluster, we are suffering from OOM of shuffle-service. We found that a lot of executors are fetching blocks from a single shuffle-service. Analyzing the memory, we found that the

[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-07 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18231 [WIP][SPARK-20994] Remove reduant characters in OpenBlocks to save memory for shuffle service. ## What changes were proposed in this pull request? In current code, blockIds in

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @srowen Thanks a lot looking into this :) For example: blockId="shuffle_20_1000_2000", it is stored as an `String`, which costs more than 20 bytes. In this change, it will c

[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-07 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r120808962 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,47 @@ private

[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-07 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r120809215 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,47 @@ private

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @vanzin Thanks a lot for reviewing this. I refined according to your comments, Please take another look at this when you have time :) --- If your project is set up for it, you can reply to

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 Actually it's more than 12 bytes. Yes, there are millions of these. In my heap dump, it's 1.5 G --- If your project is set up for it, you can reply to this email and have your re

[GitHub] spark pull request #18239: [SPARK-19462] fix bug in Exchange--pass in a tmp ...

2017-06-08 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18239 [SPARK-19462] fix bug in Exchange--pass in a tmp "newPartitioning" in "prepareShuffleDependency" When `spark.sql.adaptive.enabled` is true, any rerunning of ancestors of `

[GitHub] spark issue #18239: [SPARK-19462] fix bug in Exchange--pass in a tmp "newPar...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18239 I'm not sure if it is appropriate to make this pr and backport to 1.6. It's great if there's someone taking some time reviewing this. --- If your project is set up for it, you ca

[GitHub] spark issue #17276: [WIP][SPARK-19937] Collect metrics of block sizes when s...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @mridulm @squito Thanks a lot for taking time review this pr. I will close it for now and make another one if there is progress. --- If your project is set up for it, you can reply to

[GitHub] spark pull request #17276: [WIP][SPARK-19937] Collect metrics of block sizes...

2017-06-08 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/17276 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @srowen Sorry, I didn't make it clear. 1. In current code, all blockIds are stored in the iterator. They are released only when the iterator is traversed. 2. Now I change the `Strin

[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r120844431 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,52 @@ private

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 I mean the blockIds in `OpenBlocks`, they have reference in iterator. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r120845706 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,52 @@ private

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 The blockIds cannot be freed because they are referenced in the iterator. In current change they are not. We reference the mapIdAndReduceIds instead. Thus the blockIds in OpenBlocks can be

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 there is no where referencing `msg`, right? I guess the `msg` will be garbage collected fluently. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 Yes, I think it's great to do some tests and give a good evidence. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @srowen I did a test to verify this patch. I wrap a number of blocks inside `OpenBlocks` and send it to `ExternalShuffleBlockHandler`. With this change: it cost about 133M in the

[GitHub] spark pull request #18249: [WIP][SPARK-19937] Collect metrics for remote byt...

2017-06-09 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18249 [WIP][SPARK-19937] Collect metrics for remote bytes read to disk during shuffle. In current code(https://github.com/apache/spark/pull/16989), big blocks are shuffled to disk. This pr

[GitHub] spark issue #18239: [SPARK-19462] fix bug in Exchange--pass in a tmp "newPar...

2017-06-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18239 @jiangxb1987 would you mind to take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r121241362 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,51 @@ private

[GitHub] spark pull request #18231: [SPARK-20994] Remove reduant characters in OpenBl...

2017-06-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r121242495 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,51 @@ private

[GitHub] spark issue #18231: [SPARK-20994] Remove reduant characters in OpenBlocks to...

2017-06-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @vanzin Thanks again for comments :) I refined accordingly, please take another look when you have time. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #18239: [SPARK-19462] fix bug in Exchange--pass in a tmp "newPar...

2017-06-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18239 Very gentle ping @jiangxb1987 It would be great if you can take a look when you have time. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #18239: [SPARK-19462] fix bug in Exchange--pass in a tmp "newPar...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18239 In master branch, there's no such issue. I think the scenario described in jira is a good case. And I will add a test case in the pr. Our product env is based on spark-1.6. So I made th

[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r122237095 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,51 @@ private

[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r122238244 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,51 @@ private

[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r122238486 --- Diff: common/network-shuffle/src/test/java/org/apache/spark/network/sasl/SaslIntegrationSuite.java --- @@ -202,7 +202,7 @@ public void

[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r122240056 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,51 @@ private

[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r122240746 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java --- @@ -150,27 +150,20 @@ public void

[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @jiangxb1987 Thanks a lot for taking time review this pr. More comments are welcome. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @cloud-fan Thanks a lot for taking time review this. I refined accordingly :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

<    1   2   3   4   5   6   7   8   >