Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/16241
Looks good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/12913#discussion_r84570152
--- Diff:
core/src/test/scala/org/apache/spark/serializer/UnsafeKryoSerializerSuite.scala
---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/12913
@techaddict Cool, thanks! Just remembered a couple more things:
- Can you edit KryoSerializerSuite to set the flag to false? Otherwise we
might silently end up with both suites testing on true
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/12913#discussion_r83972911
--- Diff:
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala ---
@@ -75,9 +75,11 @@ class KryoSerializerSuite extends SparkFunSuite
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/12913
Looks pretty good overall! I made two small comments but it seems
worthwhile to add in and it's not a huge change.
---
If your project is set up for it, you can reply to this email and have your
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/12913#discussion_r83971706
--- Diff:
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala ---
@@ -75,9 +75,11 @@ class KryoSerializerSuite extends SparkFunSuite
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/12913#discussion_r83971396
--- Diff:
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -78,8 +79,14 @@ class KryoSerializer(conf: SparkConf)
.filter
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/8318
Probably switching from the PySpark in PyPI to a version you installed
locally by downloading Spark.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/8318
Yes, it would be great to get this done. Just make sure that we have a good
way to test it. Can you also document how a user is supposed to switch to a
different pyspark (if they do have Spark
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/8318
Cool, good to know that there's another ASF project that does it. We should
go for it then.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/8318
BTW the other change now is that we don't make an assembly JAR by default
anymore, though we could build one for this. We just need a build script for
this that's solid, produces a release-policy
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/8318
Something like this would be great IMO. A few questions though:
* How will it work if users want to run a different version of PySpark from
a different version of Spark (maybe something
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/14956
Cool, thanks for improving the PIC test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/14956#discussion_r78270573
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala
---
@@ -395,7 +395,7 @@ object PowerIterationClustering
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/14956
Cool, then it does make sense to change it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/14956
I think the number 5 is indeed from that paper (I think from figure 5.1
actually), but have you tested the effect of using R=2 empirically? It would be
good to check that they match what's
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/14956#discussion_r77874667
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala
---
@@ -395,7 +395,7 @@ object PowerIterationClustering
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/14931#discussion_r77294974
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -346,15 +346,16 @@ private[spark] class TaskSchedulerImpl
GitHub user mateiz opened a pull request:
https://github.com/apache/spark/pull/13748
[SPARK-16031] Add debug-only socket source in Structured Streaming
## What changes were proposed in this pull request?
This patch adds a text-based socket source similar to the one in Spark
Github user mateiz commented on the issue:
https://github.com/apache/spark/pull/13609
Looks good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/13133#issuecomment-219463999
Cool, the change does look right to me, but as Sean said there are some
style issues. It should definitely help speed up initialization!
---
If your project is set up
GitHub user mateiz opened a pull request:
https://github.com/apache/spark/pull/12140
[SPARK-14356] Update spark.sql.execution.debug to work on Datasets
## What changes were proposed in this pull request?
Update DebugQuery to work on Datasets of any type, not just DataFrames
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/10092#issuecomment-161365442
It might be nice to only expose a smaller # of storage levels in Python,
i.e. call them memory_only and memory_and_disk, but always use the serialized
ones underneath
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44303122
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/StateSpec.scala ---
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44299243
--- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala
---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44299415
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/dstream/EmittedRecordsDStream.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9555#discussion_r44238871
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SumOf.scala ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9555#discussion_r44238545
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/9555#issuecomment-154909428
The user-facing API looks good to me! I added some comments on the internal
interfaces though.
---
If your project is set up for it, you can reply to this email
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9555#discussion_r44238765
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SumOf.scala ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9555#discussion_r44238638
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -39,10 +39,10 @@ private[sql] object Column
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44213675
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala
---
@@ -351,6 +351,50 @@ class PairDStreamFunctions[K, V
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44213622
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/dstream/EmittedRecordsDStream.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44214012
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/StateSpec.scala ---
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44213899
--- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala
---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44213890
--- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala
---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44213932
--- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala
---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44213921
--- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala
---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44214006
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/StateSpec.scala ---
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44214008
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/StateSpec.scala ---
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44214064
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/rdd/TrackStateRDD.scala ---
@@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44214022
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/dstream/EmittedRecordsDStream.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9256#discussion_r44214016
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/dstream/EmittedRecordsDStream.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9415#discussion_r43712005
--- Diff:
examples/src/main/scala/org/apache/spark/examples/sql/DatasetWordCount.scala ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9415#discussion_r43712012
--- Diff:
examples/src/main/scala/org/apache/spark/examples/sql/DatasetWordCount.scala ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/9398#issuecomment-153226208
Thanks for adding this! The UI itself looks good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/9214#issuecomment-151209206
Maybe just go for version 2) above then, it seems like the simplest one.
Regarding re-engineering vs not, the problem is that if you're trying to do
a bug fix
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/9214#issuecomment-150708764
Hey so I'm curious about two things here:
1) If we just always replaced the output with a new one using a file
rename, would we actually have a problem? I think
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9219#discussion_r42764498
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala ---
@@ -43,35 +43,53 @@ private[spark] class ShuffleMapStage(
val
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9190#discussion_r42702840
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9190#discussion_r42702889
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala
---
@@ -31,6 +31,7 @@ import org.apache.spark.sql.types.StructType
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9190#discussion_r42702980
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala
---
@@ -46,13 +47,27 @@ trait Encoder[T
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9190#discussion_r42703300
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala
---
@@ -46,13 +47,27 @@ trait Encoder[T
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/9175#discussion_r42573604
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -353,10 +353,15 @@ class DAGScheduler
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/6648#issuecomment-147553320
BTW, with that design, I also wouldn't even implement the delete message in
the first patch, unless we've actually seen block corruptions happen; but it
sounds like we
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/6648#issuecomment-147552582
Hey Imran,
Given the number of changes required for this approach, I wonder whether an
atomic rename design wouldn't be simpler (in particular, the "
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8844#issuecomment-143113277
Alright, merged this, thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8844#issuecomment-142703433
Alright, I made the suggested changes. I don't think we need to make those
classes `private[spark]` because they are in `src/test`, right?
---
If your project is set up
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8844#issuecomment-142802670
Alright, let me know if you guys have any other comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/8844#discussion_r40170084
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/CustomShuffledRDD.scala ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/8844#discussion_r40026880
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -323,6 +351,30 @@ private[spark] class MapOutputTrackerMaster(conf:
SparkConf
GitHub user mateiz opened a pull request:
https://github.com/apache/spark/pull/8844
[SPARK-9852] Let reduce tasks fetch multiple map output partitions
This makes two changes:
- Allow reduce tasks to fetch multiple map output partitions -- this is a
pretty small change
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8844#issuecomment-141841616
@shivaram, @JoshRosen, @zsxwing this may be relevant to you
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/8844#discussion_r39935722
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -474,9 +495,9 @@ class DAGSchedulerSuite
test(&quo
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8825#issuecomment-141693510
BTW another thing you should consider is just renaming HashShuffleReader to
BlockStoreShuffleReader and still leaving in the abstract interface. The
interface
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8825#issuecomment-141688454
By the way, the reason it took contiguous partition IDs was to make them
cheap to read from disk in one read. So I'd like to try keeping it like that
before we decide
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8825#issuecomment-141688039
I already have a patch for the range of partitions, so please leave that
in. https://github.com/mateiz/spark/tree/spark-9852
---
If your project is set up for it, you
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/8180#discussion_r39352275
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -720,31 +843,82 @@ class DAGScheduler(
try {
// New
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8180#issuecomment-139918503
Thanks for the comments; I've made the fixes. Let me know if anyone else
has other comments.
---
If your project is set up for it, you can reply to this email and have
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8180#issuecomment-138311887
@zsxwing / @squito can you take a second look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8402#issuecomment-137825173
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8402#issuecomment-137825995
Although this seems to have failed another test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8180#issuecomment-137838489
Alright, I think this is ready to review now. Changes made:
- Added more docs to DAGScheduler about how stages may be re-attempted
- Added tests on:
- More
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8402#issuecomment-137882457
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8180#issuecomment-137887534
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8180#issuecomment-137460863
BTW, I'm working on updating this with a few more tests as suggested as
well.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8180#issuecomment-137459384
Before deciding whether it's a big change, do also take a look at the
change. As I said, it's only about 100-200 lines of actual changes, the rest is
comments
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8466#issuecomment-137197353
Hey, so is the conclusion that the DAGScheduler actually did pass the
exception to JobListeners, but we weren't listening for it in our test suite? I
thought the initial
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/7699#issuecomment-137196052
Thanks, this makes sense. Anyway this PR looks good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/8180#discussion_r38587515
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -746,6 +848,63 @@ class DAGScheduler(
submitWaitingStages
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/7699#issuecomment-135802458
Regarding the wider class of problems, I just meant that the core problem
here seems to be that tasks don't get identified correctly. This also seems to
affect other
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/7699#issuecomment-134739986
This looks good to me too. I agree it's better to use .length instead of
.size now that IntelliJ complains about it (it used not to).
---
If your project is set up
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/7699#issuecomment-134740504
BTW I'd rename this JIRA or at least expand the PR description to say
track pending tasks by partition ID instead of Task objects. Otherwise it
really doesn't explain
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/7699#discussion_r37919314
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -695,6 +696,115 @@ class DAGSchedulerSuite
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/7699#discussion_r37919381
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -695,6 +696,115 @@ class DAGSchedulerSuite
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8180#issuecomment-133581945
Hey Imran, I'm curious, have you actually worked on stuff in the scheduler?
I don't know what you mean about inability to deal with complexity in it, but
it has gotten
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/8180#discussion_r37688430
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -132,13 +133,46 @@ private[spark] abstract class MapOutputTracker(conf:
SparkConf
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8280#issuecomment-132799456
@shivaram did you create a JIRA for making this affect only ShuffledRDD? I
might do it as part of https://issues.apache.org/jira/browse/SPARK-9852, which
I'm working
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8280#issuecomment-132394991
It does sound good to turn it off if there are multiple dependencies.
However, an even better solution may be to move this into ShuffledRDD, so that
we control where
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8280#issuecomment-132395677
BTW it may also be fine to turn it off by default for 1.5, but in general,
with these things, there's not much point having them in the code if they're
off by default
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8220#issuecomment-131422078
Sounds good.. I'll merge it once tests pass.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8220#issuecomment-131281158
@shivaram here it is.. we should merge this into branch-1.5 too if it's
good.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user mateiz opened a pull request:
https://github.com/apache/spark/pull/8220
[SPARK-10008] Ensure shuffle locality doesn't take precedence over narrow
deps
The shuffle locality patch made the DAGScheduler aware of shuffle data,
but for RDDs that have both narrow
GitHub user mateiz opened a pull request:
https://github.com/apache/spark/pull/8180
[SPARK-9851] Support submitting map stages individually in DAGScheduler
This patch adds support for submitting map stages in a DAG individually so
that we can make downstream decisions after seeing
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8180#issuecomment-130871144
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8180#issuecomment-130968746
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/8018#discussion_r36688369
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/PlatformDependent.java ---
@@ -145,21 +147,27 @@ public static void freeMemory(long address
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/8018#issuecomment-129619741
I didn't realize that Java's BigDecimal already has a shortcut for things
that fit in a Long. That definitely simplifies it. In terms of this change, the
biggest thing
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/7712#issuecomment-125482355
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/7712#discussion_r35614193
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
---
@@ -35,11 +34,12 @@ private class
1 - 100 of 1986 matches
Mail list logo