[GitHub] spark pull request: [SPARK-2193] Improve tasks preferrd locality b...

2014-06-19 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1131#issuecomment-46537074 A preferred task for one worker might be picked up by another worker on process/node/rack only if they are at the same locality level : in which case, it is irrelevant

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-06-28 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1165#issuecomment-47424883 Instead of trying to fit data into memory which cant be - isnt the current behavior of spark not better : simply fail if insufficient memory - user can always run

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-06-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14324878 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-06-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14324897 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-06-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14324901 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-06-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14324910 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-06-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14324964 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -61,9 +61,9 @@ private[spark] class BlockManager( // Actual

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-06-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14324977 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -55,8 +55,7 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-06-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14324995 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -87,9 +86,7 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-06-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14325021 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala --- @@ -50,6 +50,16 @@ class PrimitiveVector[@specialized(Long, Int

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-06-28 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1165#issuecomment-47432066 Looks like an interesting patch @andrewor14 ! I added a few comments based on a reasonably cursory glance. Will let someone else do a more detailed review

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1212#discussion_r14552579 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -431,6 +442,10 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1212#discussion_r14552666 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -431,6 +442,10 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-04 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-48024156 Looks good, thanks ! Please add a specific testcase which tests the change in behavior : namely, what used to be ANY schedule earlier is not RACK_LOCAL. --- If your

[GitHub] spark pull request: SPARK-2368 Making FileServerHandler sends grac...

2014-07-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1303#discussion_r14558471 --- Diff: core/src/main/java/org/apache/spark/network/netty/FileServerHandler.java --- @@ -80,4 +78,11 @@ public void exceptionCaught(ChannelHandlerContext

[GitHub] spark pull request: SPARK-2368 Making FileServerHandler sends grac...

2014-07-04 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1303#issuecomment-48041294 The config changes look promising (btw, you will need to read them from SparkConf and not hardcode values) - rest can be ommitted. --- If your project is set up

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r14588024 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -238,7 +238,7 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-09 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1165#issuecomment-48502826 gmail was marking all mails from github as spam - apologies for the delay in getting to this ! Did not see the updates. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-09 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14722107 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-09 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14722187 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-09 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14722303 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -87,9 +86,7 @@ private class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

2014-07-09 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1327#issuecomment-48504833 As @aarondav mentions, this class has never been 're-used'. As part of our shuffle consolidation fixes, we actually depend on this behavior (and enforce it via

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-09 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r14723764 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -362,21 +363,25 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-09 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14738178 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-09 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r14739156 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

2014-07-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1327#issuecomment-48571146 The reason to initialize size upfront and keep it immutable is to enforce constraints on the writer since it is lazily initialized. On Jul 10, 2014 10:29 AM

[GitHub] spark pull request: [SPARK-2439] Actually close FileSystem in Mast...

2014-07-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1363#issuecomment-48774986 On the contrary, we should remove all filesystem.close() in spark : since shutdown hooks commit/flush pending messages to hdfs. filesystem object's are typically

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-48776662 s/not/now/ :-) As in, to test the expected change in behavior for a specific host before rack_local host gets added the schedules should be to ANY; and after

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48835312 We have gone over this in the past .. it is suboptimal to make it a linear function of executor/driver memory. Overhead is a function of number of executors

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48835566 The default constant is actually a lowerbound to account for other overheads (since yarn will aggressively kill tasks)... Unfortunately we have not sized

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48835618 That would be a function of your jobs. Other apps would have a drastically different characteristics ... Which is why we can't generalize to a simple fraction

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48835656 The basic issue is you are trying to model overhead using the wrong variable... It has no correlation on executor memory actually (other than vm overheads as heap

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48835769 You are lucky :-) for some of our jobs, in a 8gb container, overhead is 1.8gb ! On 13-Jul-2014 2:41 pm, nishkamravi2 notificati...@github.com wrote: Sean

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48836123 You are missing my point I think ... To give unscientific anecdotal example : our gbdt expiriments , which run on about 22 nodes need no tuning. While our

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48836408 On Jul 13, 2014 3:16 PM, nishkamravi2 notificati...@github.com wrote: Mridul, I think you are missing the point. We understand that this parameter

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48836619 Hmm, looks like some of my responses to Sean via mail reply have not shown up here ... Maybe mail gateway delays ? --- If your project is set up for it, you can reply

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48836879 Since this is a recurring nightmare for our users, let me try to list down the factors which influence overhead given current spark codebase state in the jira when

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-48849034 Hi @CodingCat looks good to me. My only doubt, which we discussed last, was whether we want to differentiate between tasks which have no locations at all vs tasks

[GitHub] spark pull request: SPARK-2469: Use Snappy (instead of LZF) for de...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48994982 Do we want to change default for everything or only for shuffle ? (only shuffle wont impact anything outside of spark) What would be impact on user data if we change

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48996763 @andrewor14 do we also log the block size, etc of the codec used ? If yes, then atleast for event data we should be fine. IIRC we use the codec to compress

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005728 weird that test failures - unrelated to this change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005818 ah yes, blocksize is only used during compression time : and inferred from stream during decompression. Then only class name should be sufficient --- If your

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49006034 hi @lirui-intel looks good to me ! Will merge when I get my laptop working again - unfortunate state of affairs :-) In meantime, if @pwendell or someone else

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49006312 Cant comment on tachyon since we dont use it and have no experience with it unfortunately. I am fine with this change for the rest. --- If your project is set up

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49200080 I just noticed that pendingTasksWithNotReadyPrefs is not being used now ? It is getting updated but never actually queried from ... Do we need to maintain

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49200501 Thanks @lirui-intel merged finally :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49207097 My rationale for suggesting noPrefs tasks after NODE_LOCAL was to ensure that noPref tasks do not preempt locality for NODE_LOCAL tasks (they cant for PROCESS_LOCAL

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49214821 Thanks, but that is fine, I merged it in after I resolved my local hardware issues today. So did not need to impose on you to merge after all ! On 17-Jul-2014

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49215361 I am not sure if it is ok to wait ... This is something I never considered from the beginning when I added process_local ... Maybe it is ok ! If it is not, then we

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49221715 Same here, Kay any thoughts ? On 17-Jul-2014 1:44 am, Matei Zaharia notificati...@github.com wrote: Ah, that's true, it won't be as common now. Anyway I'd

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49234491 Thinking about this more, if I am not wrong, current scheduler can cause suboptimal schedules when there are multiple tasksetmanagers. Particularly relevant

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-17 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/159#discussion_r10647093 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -59,6 +59,15 @@ private[spark] class TaskSetManager( // CPUs

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10701708 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10701805 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10702886 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1025,6 +1025,14 @@ abstract class RDD[T: ClassTag]( checkpointData.flatMap

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-18 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/126#issuecomment-37938604 Would be lovely to see this patch in ! We have a whole bunch of hacks just to avoid memory and resource issues currently : this would alleviate a lot of them

[GitHub] spark pull request: SPARK-1099: Make local mode use all available ...

2014-03-19 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/182#issuecomment-38135323 On Wed, Mar 19, 2014 at 8:42 PM, Aaron Davidson notificati...@github.comwrote: This *is* an incompatible change, which is why it is targeted for 1.0

[GitHub] spark pull request: [SPARK-1259] Make RDD locally iterable

2014-03-22 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/156#issuecomment-38353582 Some time in future (not in this PR), we would want to add support for disk-backed support for this feature : in case a partition is too large to fit into memory

[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

2014-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/194#discussion_r10862584 --- Diff: external/hbase/src/main/scala/org/apache/spark/nosql/hbase/SparkHBaseWriter.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

2014-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/194#discussion_r10862585 --- Diff: external/hbase/src/main/scala/org/apache/spark/nosql/hbase/SparkHBaseWriter.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: fix java.nio.charset.MalformedInputException

2014-03-22 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/203#issuecomment-38354071 There are other places where Source.fromInputStream is getting used (for example PipedRDD) you might also want to look for places where utf-8 is directly used

[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

2014-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/194#discussion_r10862643 --- Diff: external/hbase/src/main/scala/org/apache/spark/nosql/hbase/HBaseUtils.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

2014-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/194#discussion_r10862647 --- Diff: external/hbase/src/main/scala/org/apache/spark/nosql/hbase/SparkHBaseWriter.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

2014-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/194#discussion_r10862656 --- Diff: external/hbase/src/main/scala/org/apache/spark/nosql/hbase/SparkHBaseWriter.scala --- @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1099: Introduce local[*] mode to infer n...

2014-03-23 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/182#issuecomment-38375109 Sounds good, as long as there is no behavioural change for existing code/scripts which use 'local' as the default in most cases, additional semantics are fine - atleast

[GitHub] spark pull request: Fix scheduler to account for tasks using 1 C...

2014-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/219#issuecomment-38608715 Looks good, thanks for the change - makes things cleaner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-732: eliminate duplicate update of the a...

2014-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/228#discussion_r10948883 --- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala --- @@ -237,12 +240,7 @@ private object Accumulators { // TODO: Use soft references

[GitHub] spark pull request: SPARK-732: eliminate duplicate update of the a...

2014-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/228#discussion_r10949181 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -925,6 +939,10 @@ class DAGScheduler

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10949666 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -243,9 +275,16 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38613780 In coarse grained scheduler, freeCores is updated once the task desc's are returned - in launchTasks; and expected to be used within the actor thread (so MT-unsafe

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38615338 Ah, I see what you mean - pull all of the logic within successful schedule of resourceOffer into the caller. Yeah, that should work fine (with the caveat of setting

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38615396 Btw, we might want to make it some util method somewhere - so that the various backends dont need to duplicate this code. --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-732: eliminate duplicate update of the a...

2014-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/228#discussion_r10953086 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -814,8 +819,12 @@ class DAGScheduler( event.reason match

[GitHub] spark pull request: [SPARK-782] Made Spark use existing shaded ASM...

2014-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/232#issuecomment-38644232 Which dependency is providing this shaded version ? Looks slightly dodgy to use shaded version of some other packages asm ... in case they change their dependency

[GitHub] spark pull request: [WIP] Spark 1271 (1320) cogroup and groupby sh...

2014-03-26 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/242#issuecomment-38771396 fair enough, just wanted to ensure it was being done in preparation for disk backed iterator (or atleast that would be something which can be implemented using

[GitHub] spark pull request: [WIP] Spark 1271 (1320) cogroup and groupby sh...

2014-03-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/242#issuecomment-38880515 @pwendell agree, I was not referring to method overloading. There is a lot of code already written which will require rewrite if interface changes (how significant, I

[GitHub] spark pull request: SPARK-732: eliminate duplicate update of the a...

2014-03-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/228#discussion_r11094324 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -116,6 +116,9 @@ class DAGScheduler( private val metadataCleaner

[GitHub] spark pull request: SPARK-732: eliminate duplicate update of the a...

2014-03-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/228#discussion_r11094329 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -789,6 +799,25 @@ class DAGScheduler

[GitHub] spark pull request: SPARK-732: eliminate duplicate update of the a...

2014-03-29 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/228#issuecomment-39003471 looks good @CodingCat ! just made a few minor points. excellent change !! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-1376. In the yarn-cluster submitter, ren...

2014-03-31 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/279#issuecomment-39161560 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1376. In the yarn-cluster submitter, ren...

2014-03-31 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/279#issuecomment-39161679 That is a fairly ambiguous message from jenkins :-) cc @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-1376. In the yarn-cluster submitter, ren...

2014-03-31 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/279#issuecomment-39163056 LGTM, pushed to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1376. In the yarn-cluster submitter, ren...

2014-03-31 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/279#issuecomment-39163251 Hmm, looks like the push failed ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-1376. In the yarn-cluster submitter, ren...

2014-04-01 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/279#issuecomment-39178097 hey @pwendell so I am confused : did my commit actually go through :-) I got a bunch of errors, and then it went through (atleast the script said so !). Now I am

[GitHub] spark pull request: Merge Hadoop Into Spark

2014-04-01 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/286#issuecomment-39251945 LGTM, merged ! On Wed, Apr 2, 2014 at 12:20 AM, Shivaram Venkataraman notificati...@github.com wrote: I love hadoop-0.20.2 -- It is the best

[GitHub] spark pull request: Renamed stageIdToActiveJob to jobIdToActiveJob...

2014-04-02 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/301#issuecomment-39301994 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: fix computePreferredLocations signature to not...

2014-04-02 Thread mridulm
GitHub user mridulm opened a pull request: https://github.com/apache/spark/pull/302 fix computePreferredLocations signature to not depend on underlying implementation Change to Map and Set - not mutable HashMap and HashSet You can merge this pull request into a Git

[GitHub] spark pull request: [SPARK-1371] fix computePreferredLocations sig...

2014-04-02 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/302#issuecomment-39313257 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-02 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11235466 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala --- @@ -74,13 +74,17 @@ private[spark] class DriverRunner

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-02 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11235611 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -42,11 +42,16 @@ private[spark] class

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-02 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11235668 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -340,8 +341,22 @@ trait ClientBase extends Logging

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

2014-04-02 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/299#discussion_r11236051 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -340,8 +341,22 @@ trait ClientBase extends Logging

[GitHub] spark pull request: [SPARK-1371] fix computePreferredLocations sig...

2014-04-02 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/302#issuecomment-39403132 Will make the changes, thx for review @andrewor14 Motivation is Interface cleanup - when I initially wrote it, it was more for my local testing : and exposed it since

[GitHub] spark pull request: Spark 1271 (1320) cogroup and groupby should p...

2014-04-02 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/242#issuecomment-39404342 @pwendell @holdenk i dont have a good solution to keep compatibility, was hoping you guys might :-) This will require some rewrite, but I am hoping it wont be more

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-04-05 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-39653130 Just to be clear, the current status of the patch seems to be : a) If user specified logging config - use that. b) If missing, use a spark config built into the jar

[GitHub] spark pull request: SPARK-1093: Annotate developer and experimenta...

2014-04-06 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/274#issuecomment-39663561 Hi @pwendell, Can you please mark * core/src/main/scala/org/apache/spark/util/collection/AppendOnlyMap.scala * core/src/main/scala/org/apache/spark/util

[GitHub] spark pull request: SPARK-1417: Spark on Yarn - spark UI link from...

2014-04-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/344#issuecomment-40178755 Merging this into master. @tgravescs, @pwendell if this is relevant to 1.0, please do merge into that branch too. --- If your project is set up for it, you can reply

[GitHub] spark pull request: Yarn: do not set local IP in remote process en...

2014-04-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/394#issuecomment-40261852 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1057 (alternative) Remove fastutil

2014-04-12 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/266#issuecomment-40273037 I did not notice this earlier. The toByteArray method is insanely expensive for anything nontrivial. A better solution would be to replace use

[GitHub] spark pull request: SPARK-1057 (alternative) Remove fastutil

2014-04-12 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/266#issuecomment-40273651 I think we can replace it with a custom impl - where we decide that it is ok to waste some memory within some threshold in case the copy is much more expensive

<    1   2   3   4   5   6   7   8   9   10   >