[GitHub] spark pull request: [SPARK-3444] Provide an easy way to change log...

2015-05-01 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5791#discussion_r29491895 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -343,6 +343,15 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-05-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-98059787 @sarutak thanks for updating this - I think I can merge it soon. I will create JIRA's for some follow up. Actually it would be nice if you could submit a follow up

[GitHub] spark pull request: [SPARK-3444] Provide an easy way to change log...

2015-05-01 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5791#discussion_r29492392 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -343,6 +343,15 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-7224] added mock repository generator f...

2015-05-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5790#issuecomment-98220430 Thanks tom I reverted this. On Fri, May 1, 2015 at 11:23 AM, Tom Graves notificati...@github.com wrote: filed https://issues.apache.org/jira/browse

[GitHub] spark pull request: SPARK-5112. Expose SizeEstimator as a develope...

2015-05-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3913#issuecomment-98071817 I'm good with the new idea you proposed. We keep `util.SizeEstimator` as is, but we have a public class in the root namespace `SizeEstimator` that just has a single

[GitHub] spark pull request: redir stderr better, remove unused code, bette...

2015-05-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5817#issuecomment-98239473 Hey Shane - thanks for sending this. We're right in the middle of a big feature rush for Spark 1.3 so we're not looking at any non essential patches ATM, and especially

[GitHub] spark pull request: [SPARK-3444] Provide an easy way to change log...

2015-05-01 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5791#issuecomment-98239959 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-7139][Streaming] Allow received block m...

2015-05-01 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5732#discussion_r29536958 --- Diff: core/src/main/scala/org/apache/spark/rdd/BlockRDD.scala --- @@ -66,7 +67,9 @@ class BlockRDD[T: ClassTag](@transient sc: SparkContext, @transient

[GitHub] spark pull request: SPARK-5112. Expose SizeEstimator as a develope...

2015-04-30 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3913#issuecomment-97679846 Pending an update LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7224] added mock repository generator f...

2015-04-30 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5790#discussion_r29409553 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -731,6 +734,10 @@ private[deploy] object SparkSubmitUtils

[GitHub] spark pull request: [SPARK-7224] added mock repository generator f...

2015-04-30 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5790#issuecomment-97884916 Looks good - thanks I can merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-5112. Expose SizeEstimator as a develope...

2015-04-30 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3913#issuecomment-98039213 Hey @sryza and @srowen, sorry to vacillate here, but after looking at this more I really do think it would be better to just make this a static method on `SparkContext

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-30 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-98049642 @andrewor14 if you can update this I think this one is good to go. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-30 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4450#issuecomment-98038092 LGTM pending a final test run. I want to get this in since there may be some other changes to shuffle interfaces due to some of the binary management stuff

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-30 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4450#issuecomment-98038047 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...

2015-04-30 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5430#issuecomment-98036464 Jenkins, retest this please. Thanks @zhzhan I reverted the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...

2015-04-30 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5430#issuecomment-98005290 I took as pass and this LGTM. However it needs to be brought up to date. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-30 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5685#discussion_r29405393 --- Diff: core/src/test/scala/org/apache/spark/util/ClosureCleanerSuite2.scala --- @@ -0,0 +1,562 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-30 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-97672308 Hey Andrew, This is looking good. The code is quite dense but as far as I can tell, this is correct. I left some more surface level comments, if you can get

[GitHub] spark pull request: [SPARK-7205] Support `.ivy2/local` and `.m2/re...

2015-04-29 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5755#issuecomment-97312152 LGTM - but just to be sure, since the unit tests don't actually cover loading a jar from the local maven or ivy cache, have you tested it locally? If you wanted to go

[GitHub] spark pull request: [SPARK-7205] Support `.ivy2/local` and `.m2/re...

2015-04-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5755#discussion_r29311326 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -722,13 +722,31 @@ private[deploy] object SparkSubmitUtils

[GitHub] spark pull request: [SPARK-7205] Support `.ivy2/local` and `.m2/re...

2015-04-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5755#discussion_r29311362 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -722,13 +722,31 @@ private[deploy] object SparkSubmitUtils

[GitHub] spark pull request: [SPARK-7205] Support `.ivy2/local` and `.m2/re...

2015-04-29 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5755#issuecomment-97316441 Okay thanks - we have SPARK-7224 for dealing with better automated testing. I'll merge this now. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-6752][Streaming][Reopened] Allow Stream...

2015-04-29 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5773#issuecomment-97557255 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5685#discussion_r29391102 --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala --- @@ -101,21 +115,124 @@ private[spark] object ClosureCleaner extends Logging

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5685#discussion_r29392113 --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala --- @@ -55,10 +58,14 @@ private[spark] object ClosureCleaner extends Logging

[GitHub] spark pull request: [SPARK-7224] added mock repository generator f...

2015-04-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5790#discussion_r29386023 --- Diff: core/src/test/scala/org/apache/spark/deploy/IvyTestUtils.scala --- @@ -0,0 +1,278 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5685#discussion_r29391891 --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala --- @@ -101,21 +115,124 @@ private[spark] object ClosureCleaner extends Logging

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5685#discussion_r29392231 --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala --- @@ -184,19 +335,17 @@ private[spark] object ClosureCleaner extends Logging

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5685#discussion_r29391976 --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala --- @@ -77,6 +80,9 @@ private[spark] object ClosureCleaner extends Logging

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5685#discussion_r29393859 --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala --- @@ -126,34 +243,66 @@ private[spark] object ClosureCleaner extends Logging

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5685#discussion_r29404655 --- Diff: core/src/test/scala/org/apache/spark/util/ClosureCleanerSuite2.scala --- @@ -0,0 +1,562 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7204] Fix callSite for Dataframe and SQ...

2015-04-28 Thread pwendell
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/5757 [SPARK-7204] Fix callSite for Dataframe and SQL operations This patch adds SQL to the set of excluded libraries when generating a callSite. This makes the callSite mechanism work properly

[GitHub] spark pull request: [minor] [core] Warn users who try to cache RDD...

2015-04-28 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5751#issuecomment-97177100 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29206748 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29206723 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -740,15 +723,29 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5645#discussion_r29209219 --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5645#discussion_r29209101 --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java --- @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5645#discussion_r29209731 --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLog.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5645#issuecomment-96884076 I added some comments on the public interface. The main one is about whether we use opaque buffers to make the serialization of the segment identifier more explicit

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5645#discussion_r29210080 --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java --- @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5645#discussion_r29209693 --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java --- @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5645#discussion_r29213656 --- Diff: streaming/src/main/java/org/apache/spark/streaming/util/WriteAheadLogSegment.java --- @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-7063 when lz4 compression is used, it ca...

2015-04-27 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5641#issuecomment-96780950 @srowen yeah maybe let's close this as a won't fix and just give the workaround for IBM users to upgrade the dependency themselves. --- If your project is set up

[GitHub] spark pull request: [SPARK-4925] Publish Spark SQL hive-thriftserv...

2015-04-27 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5429#issuecomment-96770220 @srowen thanks for pinging this. I will merge it. I ended up doing something manual for Spark 1.3.1 to allow us to publish this: http://search.maven.org

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113464 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113581 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113605 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113602 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96436586 Could it be confusing to users that the ID associated with each record might be different on stage or task retries? The fact that ordering within a partition

[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

2015-04-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5403#issuecomment-96436773 By the way - if we did end up deciding to include this, I do feel that: 1. We should not mark this as solving SPARK-3376 (the goal there was to build

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112530 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113435 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113570 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113616 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112203 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed

[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

2015-04-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5577#issuecomment-96422091 Seems good to me - @rxin any comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112308 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112469 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -53,9 +53,14 @@ private[spark] abstract class BlockObjectWriter(val

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112559 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -113,11 +114,21 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112322 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112320 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112547 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -113,11 +114,21 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112553 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -113,11 +114,21 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112635 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -113,11 +114,21 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112713 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -53,9 +53,14 @@ private[spark] abstract class BlockObjectWriter(val

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112286 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29117710 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96463353 No, but the ordering of records in a partition can change, so you might have different identifiers for the same record across retries (unless this is only used

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29118071 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -740,15 +723,29 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96503022 Oh I see - I guess it doesn't matter then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96502816 @rxin yeah I just mean if I'm in a database and I run the same query twice, I will get the same row ID for the same record. Because of non determinism in the shuffle

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29118289 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -740,15 +723,29 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29118405 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29118412 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29118576 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29118645 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4450#issuecomment-96485873 Hey Sandy, I've now taken a pretty thorough look at this patch. There are a lot of low level comments and it would be nice if you could do a pass to bring

[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...

2015-04-25 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5430#issuecomment-96258683 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-25 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-96285701 @andrewor14 I manually triggered this using Josh's tool - looks like PRB is being tempermental. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

2015-04-25 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5403#issuecomment-96269096 Actually I lied - in the codebase we do have some flags we use only for performance analysis. One is spark.shuffle.sync which forces writes to sync to disk much more

[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

2015-04-25 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5403#issuecomment-96267909 My understanding was this wouldn't be an experimental feature in terms of how we've defined that in the past (i.e. it's not on a path to being something we'd expect

[GitHub] spark pull request: SPARK-6333 [CORE] Added compression option to ...

2015-04-25 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5030#issuecomment-96145516 Thanks for pinging this - yeah this is fine to do, just add a MIMA exception. We allow additions to JavaRDDLike trait, since the expectation is that trait

[GitHub] spark pull request: [SQL] [WIP] Partitioning support for the data ...

2015-04-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5526#issuecomment-96026476 @liancheng can you fill in the JIRA number here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Upgrade the json4s version

2015-04-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5691#issuecomment-95990923 @moranmathias Do you mind creating a JIRA for this? It's good to have an audit trail of any version upgrades, since sometimes they cause maintenance issues downstream

[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

2015-04-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-95991323 @srowen wondering, what do you mean by local state? The closure cleaner only nulls out fields in _clones_ of local objects (actually the nulling mechanism is really

[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

2015-04-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5685#discussion_r29082998 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1740,7 +1740,8 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

2015-04-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5685#discussion_r29083093 --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala --- @@ -77,6 +80,9 @@ private[spark] object ClosureCleaner extends Logging

[GitHub] spark pull request: SPARK-7103: Fix crash with SparkContext.union ...

2015-04-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5679#issuecomment-96058807 Jenkins, test his please. This LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28990365 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala --- @@ -17,20 +17,167 @@ package org.apache.spark.ui.jobs -import

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28990248 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala --- @@ -17,20 +17,167 @@ package org.apache.spark.ui.jobs -import

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28991445 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,17 +17,170 @@ package org.apache.spark.ui.jobs

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28992394 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,17 +17,170 @@ package org.apache.spark.ui.jobs

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28992431 --- Diff: core/src/main/resources/org/apache/spark/ui/static/timeline-view.js --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28991508 --- Diff: core/src/main/resources/org/apache/spark/ui/static/timeline-view.js --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28991372 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,17 +17,170 @@ package org.apache.spark.ui.jobs

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28992311 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,17 +17,170 @@ package org.apache.spark.ui.jobs

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-95684815 I did another pass - this is looking pretty good, but I'd like to test it with more workloads. We should actually try to improve our current workload generator to do

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r29005754 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,17 +17,172 @@ package org.apache.spark.ui.jobs

[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r29004849 --- Diff: core/src/main/resources/org/apache/spark/ui/static/timeline-view.js --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software

<    1   2   3   4   5   6   7   8   9   10   >