Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1131#issuecomment-46537074
A preferred task for one worker might be picked up by another worker on
process/node/rack only if they are at the same locality level : in which case,
it is irrelevant
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1165#issuecomment-47424883
Instead of trying to fit data into memory which cant be - isnt the current
behavior of spark not better : simply fail if insufficient memory - user can
always run
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14324878
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager:
BlockManager
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14324897
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager:
BlockManager
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14324901
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager:
BlockManager
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14324910
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager:
BlockManager
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14324964
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -61,9 +61,9 @@ private[spark] class BlockManager(
// Actual
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14324977
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -55,8 +55,7 @@ private class MemoryStore(blockManager: BlockManager
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14324995
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -87,9 +86,7 @@ private class MemoryStore(blockManager: BlockManager
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14325021
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala ---
@@ -50,6 +50,16 @@ class PrimitiveVector[@specialized(Long, Int
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1165#issuecomment-47432066
Looks like an interesting patch @andrewor14 !
I added a few comments based on a reasonably cursory glance.
Will let someone else do a more detailed review
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1212#discussion_r14552579
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -431,6 +442,10 @@ private[spark] class TaskSchedulerImpl
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1212#discussion_r14552666
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -431,6 +442,10 @@ private[spark] class TaskSchedulerImpl
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1212#issuecomment-48024156
Looks good, thanks !
Please add a specific testcase which tests the change in behavior : namely,
what used to be ANY schedule earlier is not RACK_LOCAL.
---
If your
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1303#discussion_r14558471
--- Diff:
core/src/main/java/org/apache/spark/network/netty/FileServerHandler.java ---
@@ -80,4 +78,11 @@ public void exceptionCaught(ChannelHandlerContext
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1303#issuecomment-48041294
The config changes look promising (btw, you will need to read them from
SparkConf and not hardcode values) - rest can be ommitted.
---
If your project is set up
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1313#discussion_r14588024
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -238,7 +238,7 @@ private[spark] class TaskSetManager
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1165#issuecomment-48502826
gmail was marking all mails from github as spam - apologies for the delay
in getting to this ! Did not see the updates.
---
If your project is set up for it, you can
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14722107
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager:
BlockManager
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14722187
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager:
BlockManager
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14722303
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -87,9 +86,7 @@ private class MemoryStore(blockManager: BlockManager
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1327#issuecomment-48504833
As @aarondav mentions, this class has never been 're-used'.
As part of our shuffle consolidation fixes, we actually depend on this
behavior (and enforce it via
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1313#discussion_r14723764
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -362,21 +363,25 @@ private[spark] class TaskSetManager
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14738178
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager:
BlockManager
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/1165#discussion_r14739156
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -142,10 +151,76 @@ private[spark] class CacheManager(blockManager:
BlockManager
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1327#issuecomment-48571146
The reason to initialize size upfront and keep it immutable is to enforce
constraints on the writer since it is lazily initialized.
On Jul 10, 2014 10:29 AM
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1363#issuecomment-48774986
On the contrary, we should remove all filesystem.close() in spark : since
shutdown hooks commit/flush pending messages to hdfs.
filesystem object's are typically
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1212#issuecomment-48776662
s/not/now/ :-)
As in, to test the expected change in behavior for a specific host before
rack_local host gets added the schedules should be to ANY; and after
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48835312
We have gone over this in the past .. it is suboptimal to make it a linear
function of executor/driver memory.
Overhead is a function of number of executors
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48835566
The default constant is actually a lowerbound to account for other
overheads (since yarn will aggressively kill tasks)... Unfortunately we
have not sized
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48835618
That would be a function of your jobs.
Other apps would have a drastically different characteristics ... Which is
why we can't generalize to a simple fraction
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48835656
The basic issue is you are trying to model overhead using the wrong
variable... It has no correlation on executor memory actually (other than
vm overheads as heap
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48835769
You are lucky :-) for some of our jobs, in a 8gb container, overhead is
1.8gb !
On 13-Jul-2014 2:41 pm, nishkamravi2 notificati...@github.com wrote:
Sean
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48836123
You are missing my point I think ... To give unscientific anecdotal example
: our gbdt expiriments , which run on about 22 nodes need no tuning.
While our
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48836408
On Jul 13, 2014 3:16 PM, nishkamravi2 notificati...@github.com wrote:
Mridul, I think you are missing the point. We understand that this
parameter
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48836619
Hmm, looks like some of my responses to Sean via mail reply have not shown
up here ... Maybe mail gateway delays ?
---
If your project is set up for it, you can reply
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48836879
Since this is a recurring nightmare for our users, let me try to list down
the factors which influence overhead given current spark codebase state in
the jira when
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-48849034
Hi @CodingCat looks good to me.
My only doubt, which we discussed last, was whether we want to
differentiate between tasks which have no locations at all vs tasks
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1415#issuecomment-48994982
Do we want to change default for everything or only for shuffle ? (only
shuffle wont impact anything outside of spark)
What would be impact on user data if we change
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1415#issuecomment-48996763
@andrewor14 do we also log the block size, etc of the codec used ?
If yes, then atleast for event data we should be fine.
IIRC we use the codec to compress
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1415#issuecomment-49005728
weird that test failures - unrelated to this change
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1415#issuecomment-49005818
ah yes, blocksize is only used during compression time : and inferred from
stream during decompression.
Then only class name should be sufficient
---
If your
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1212#issuecomment-49006034
hi @lirui-intel looks good to me !
Will merge when I get my laptop working again - unfortunate state of
affairs :-)
In meantime, if @pwendell or someone else
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1415#issuecomment-49006312
Cant comment on tachyon since we dont use it and have no experience with it
unfortunately.
I am fine with this change for the rest.
---
If your project is set up
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-49200080
I just noticed that pendingTasksWithNotReadyPrefs is not being used now ?
It is getting updated but never actually queried from ...
Do we need to maintain
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1212#issuecomment-49200501
Thanks @lirui-intel merged finally :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-49207097
My rationale for suggesting noPrefs tasks after NODE_LOCAL was to ensure
that noPref tasks do not preempt locality for NODE_LOCAL tasks (they cant for
PROCESS_LOCAL
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1212#issuecomment-49214821
Thanks, but that is fine, I merged it in after I resolved my local hardware
issues today.
So did not need to impose on you to merge after all !
On 17-Jul-2014
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-49215361
I am not sure if it is ok to wait ... This is something I never considered
from the beginning when I added process_local ... Maybe it is ok !
If it is not, then we
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-49221715
Same here, Kay any thoughts ?
On 17-Jul-2014 1:44 am, Matei Zaharia notificati...@github.com wrote:
Ah, that's true, it won't be as common now. Anyway I'd
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-49234491
Thinking about this more, if I am not wrong, current scheduler can cause
suboptimal schedules when there are multiple tasksetmanagers. Particularly
relevant
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/159#discussion_r10647093
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -59,6 +59,15 @@ private[spark] class TaskSetManager(
// CPUs
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/126#discussion_r10701708
--- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala ---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/126#discussion_r10701805
--- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala ---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/126#discussion_r10702886
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1025,6 +1025,14 @@ abstract class RDD[T: ClassTag](
checkpointData.flatMap
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/126#issuecomment-37938604
Would be lovely to see this patch in !
We have a whole bunch of hacks just to avoid memory and resource issues
currently : this would alleviate a lot of them
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/182#issuecomment-38135323
On Wed, Mar 19, 2014 at 8:42 PM, Aaron Davidson
notificati...@github.comwrote:
This *is* an incompatible change, which is why it is targeted for 1.0
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/156#issuecomment-38353582
Some time in future (not in this PR), we would want to add support for
disk-backed support for this feature : in case a partition is too large to fit
into memory
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/194#discussion_r10862584
--- Diff:
external/hbase/src/main/scala/org/apache/spark/nosql/hbase/SparkHBaseWriter.scala
---
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/194#discussion_r10862585
--- Diff:
external/hbase/src/main/scala/org/apache/spark/nosql/hbase/SparkHBaseWriter.scala
---
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/203#issuecomment-38354071
There are other places where Source.fromInputStream is getting used (for
example PipedRDD)
you might also want to look for places where utf-8 is directly used
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/194#discussion_r10862643
--- Diff:
external/hbase/src/main/scala/org/apache/spark/nosql/hbase/HBaseUtils.scala ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/194#discussion_r10862647
--- Diff:
external/hbase/src/main/scala/org/apache/spark/nosql/hbase/SparkHBaseWriter.scala
---
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/194#discussion_r10862656
--- Diff:
external/hbase/src/main/scala/org/apache/spark/nosql/hbase/SparkHBaseWriter.scala
---
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/182#issuecomment-38375109
Sounds good, as long as there is no behavioural change for existing
code/scripts which use 'local' as the default in most cases, additional
semantics are fine - atleast
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/219#issuecomment-38608715
Looks good, thanks for the change - makes things cleaner.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/228#discussion_r10948883
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -237,12 +240,7 @@ private object Accumulators {
// TODO: Use soft references
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/228#discussion_r10949181
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -925,6 +939,10 @@ class DAGScheduler
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/214#discussion_r10949666
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -243,9 +275,16 @@ private[spark] class TaskSchedulerImpl
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/214#issuecomment-38613780
In coarse grained scheduler, freeCores is updated once the task desc's are
returned - in launchTasks; and expected to be used within the actor thread (so
MT-unsafe
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/214#issuecomment-38615338
Ah, I see what you mean - pull all of the logic within successful schedule
of resourceOffer into the caller.
Yeah, that should work fine (with the caveat of setting
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/214#issuecomment-38615396
Btw, we might want to make it some util method somewhere - so that the
various backends dont need to duplicate this code.
---
If your project is set up for it, you can
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/228#discussion_r10953086
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -814,8 +819,12 @@ class DAGScheduler(
event.reason match
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/232#issuecomment-38644232
Which dependency is providing this shaded version ?
Looks slightly dodgy to use shaded version of some other packages asm ...
in case they change their dependency
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/242#issuecomment-38771396
fair enough, just wanted to ensure it was being done in preparation for
disk backed iterator (or atleast that would be something which can be
implemented using
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/242#issuecomment-38880515
@pwendell agree, I was not referring to method overloading.
There is a lot of code already written which will require rewrite if
interface changes (how significant, I
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/228#discussion_r11094324
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -116,6 +116,9 @@ class DAGScheduler(
private val metadataCleaner
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/228#discussion_r11094329
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -789,6 +799,25 @@ class DAGScheduler
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/228#issuecomment-39003471
looks good @CodingCat ! just made a few minor points.
excellent change !!
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/279#issuecomment-39161560
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/279#issuecomment-39161679
That is a fairly ambiguous message from jenkins :-) cc @pwendell
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/279#issuecomment-39163056
LGTM, pushed to master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/279#issuecomment-39163251
Hmm, looks like the push failed ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/279#issuecomment-39178097
hey @pwendell so I am confused : did my commit actually go through :-)
I got a bunch of errors, and then it went through (atleast the script said
so !). Now I am
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/286#issuecomment-39251945
LGTM, merged !
On Wed, Apr 2, 2014 at 12:20 AM, Shivaram Venkataraman
notificati...@github.com wrote:
I love hadoop-0.20.2 -- It is the best
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/301#issuecomment-39301994
lgtm
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user mridulm opened a pull request:
https://github.com/apache/spark/pull/302
fix computePreferredLocations signature to not depend on underlying
implementation
Change to Map and Set - not mutable HashMap and HashSet
You can merge this pull request into a Git
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/302#issuecomment-39313257
Jenkins, this is ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/299#discussion_r11235466
--- Diff:
core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala ---
@@ -74,13 +74,17 @@ private[spark] class DriverRunner
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/299#discussion_r11235611
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
---
@@ -42,11 +42,16 @@ private[spark] class
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/299#discussion_r11235668
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -340,8 +341,22 @@ trait ClientBase extends Logging
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/299#discussion_r11236051
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -340,8 +341,22 @@ trait ClientBase extends Logging
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/302#issuecomment-39403132
Will make the changes, thx for review @andrewor14
Motivation is Interface cleanup - when I initially wrote it, it was more
for my local testing : and exposed it since
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/242#issuecomment-39404342
@pwendell @holdenk i dont have a good solution to keep compatibility, was
hoping you guys might :-)
This will require some rewrite, but I am hoping it wont be more
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/148#issuecomment-39653130
Just to be clear, the current status of the patch seems to be :
a) If user specified logging config - use that.
b) If missing, use a spark config built into the jar
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/274#issuecomment-39663561
Hi @pwendell, Can you please mark
* core/src/main/scala/org/apache/spark/util/collection/AppendOnlyMap.scala
*
core/src/main/scala/org/apache/spark/util
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/344#issuecomment-40178755
Merging this into master.
@tgravescs, @pwendell if this is relevant to 1.0, please do merge into that
branch too.
---
If your project is set up for it, you can reply
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/394#issuecomment-40261852
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/266#issuecomment-40273037
I did not notice this earlier.
The toByteArray method is insanely expensive for anything nontrivial.
A better solution would be to replace use
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/266#issuecomment-40273651
I think we can replace it with a custom impl - where we decide that it is
ok to waste some memory within some threshold in case the copy is much more
expensive
101 - 200 of 1286 matches
Mail list logo