Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4293#issuecomment-72509826
Updated patch adds instructions on how to avoid the exception and extends
behavior to `NewHadoopRDD`.
My opinion is still that this deserves an Exception rather
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4312#issuecomment-72515475
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3670#issuecomment-72500076
This keeps failing random different streaming tests
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3670#issuecomment-72500090
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4309#issuecomment-72500750
LGTM. It looks like I missed this in the shuffle of SPARK-1714.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4067#discussion_r23942054
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -47,9 +49,13 @@ private[spark] class CacheManager(blockManager:
BlockManager) extends
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4050#issuecomment-72503592
@rxin this should be fixed by SPARK-5492
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4293#issuecomment-72520547
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23899671
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -134,12 +136,29 @@ object SparkSubmit
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23899955
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
@@ -98,6 +121,9 @@ class YarnClusterSuite extends FunSuite
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3976#issuecomment-72377952
Thanks for adding the test and getting it to work @lianhuiwang. Had a few
more comments, but this is looking close to me.
---
If your project is set up for it, you can
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23899881
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +292,22 @@ object SparkSubmit {
// In yarn-cluster mode, use
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23899688
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +292,22 @@ object SparkSubmit {
// In yarn-cluster mode, use
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23899908
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +292,22 @@ object SparkSubmit {
// In yarn-cluster mode, use
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/4305
SPARK-5492. Thread statistics can break with older Hadoop versions
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-5492
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4258#issuecomment-72405667
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23858269
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +277,22 @@ object SparkSubmit {
// In yarn-cluster mode, use
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3976#issuecomment-72240509
This looks like the right approach. Added some comments inline. Are you
able to add a test for this in `YarnClusterSuite`?
Also, one last small thing
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23858127
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +277,22 @@ object SparkSubmit {
// In yarn-cluster mode, use
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4291#issuecomment-72240710
Hi Kai, mind tagging this [SQL] so it can get properly sorted?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23857948
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +277,22 @@ object SparkSubmit {
// In yarn-cluster mode, use
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4258#discussion_r23855340
--- Diff:
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -97,7 +87,9 @@ class KryoSerializer(conf: SparkConf)
// Use
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23856550
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -165,6 +168,13 @@ object SparkSubmit
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23856503
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -138,8 +140,9 @@ object SparkSubmit {
(clusterManager, deployMode) match
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23857534
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
@@ -103,11 +104,15 @@ private[spark] class ClientArguments(args:
Array
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3670#discussion_r23883301
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -513,13 +516,44 @@ private[spark] object Utils extends Logging {
Files.move
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4067#discussion_r23884482
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
---
@@ -241,21 +242,22 @@ object DataWriteMethod extends Enumeration
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4067#discussion_r23884519
--- Diff: core/src/main/scala/org/apache/spark/ui/ToolTips.scala ---
@@ -29,14 +29,15 @@ private[spark] object ToolTips {
val SHUFFLE_READ_BLOCKED_TIME
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23856753
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +277,22 @@ object SparkSubmit {
// In yarn-cluster mode, use
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23857834
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -148,7 +151,7 @@ object SparkSubmit {
}
// If we're
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/4293
SPARK-5500. Document that feeding hadoopFile into a shuffle operation wi...
...ll cause problems
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23857876
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +277,22 @@ object SparkSubmit {
// In yarn-cluster mode, use
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23857919
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +277,22 @@ object SparkSubmit {
// In yarn-cluster mode, use
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4293#issuecomment-72301039
@rxin @JoshRosen I like both of those ideas. Updated patch implements
Josh's . Reynold's is a little more involved, but would be good to implement
down the line as well
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4293#discussion_r23885959
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -308,6 +309,14 @@ class HadoopRDD[K, V](
// Do nothing. Hadoop RDD should
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3976#discussion_r23856668
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
@@ -103,11 +104,15 @@ private[spark] class ClientArguments(args:
Array
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4216#issuecomment-72289589
@pwendell @mateiz I realize I'm chiming in late here, but I think the
primary concern is that the protocol doesn't satisfy the principle of least
astonishment with respect
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3233#discussion_r23801023
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -350,9 +351,20 @@ class SparkConf(loadDefaults: Boolean) extends
Cloneable with Logging
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3233#discussion_r23805531
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -760,7 +756,13 @@ object Client extends Logging
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3233#discussion_r23804603
--- Diff:
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
---
@@ -154,20 +158,69 @@ private[spark] object
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3233#discussion_r23805723
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -451,8 +452,20 @@ private[spark] class ApplicationMaster
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3233#discussion_r23803402
--- Diff:
core/src/main/scala/org/apache/spark/executor/ExecutorURLClassLoader.scala ---
@@ -32,36 +36,52 @@ private[spark] trait MutableURLClassLoader extends
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3233#discussion_r23805787
--- Diff:
core/src/main/scala/org/apache/spark/executor/ExecutorURLClassLoader.scala ---
@@ -32,36 +36,52 @@ private[spark] trait MutableURLClassLoader extends
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3233#discussion_r23804949
--- Diff:
core/src/main/scala/org/apache/spark/executor/ExecutorURLClassLoader.scala ---
@@ -32,36 +36,52 @@ private[spark] trait MutableURLClassLoader extends
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3233#discussion_r23805094
--- Diff: docs/configuration.md ---
@@ -285,13 +285,13 @@ Apart from these, the following properties are also
available, and may be useful
/td
/tr
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4251#issuecomment-71877377
Exactly
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3233#issuecomment-71943787
The new spark.driver.userClassPathFirst property seems a little strange to
me in that, IIUC, it only takes effect when the driver is started through the
application master
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3233#discussion_r23738607
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -375,4 +390,64 @@ private[spark] object SparkConf {
def isSparkPortConf(name
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4228#issuecomment-71957158
@mengxr makes sense
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4155#discussion_r23699876
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -63,7 +63,7 @@ class DAGScheduler(
mapOutputTracker
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4155#discussion_r23699799
--- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala ---
@@ -106,18 +107,30 @@ class SparkHadoopWriter(@transient jobConf: JobConf
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4155#discussion_r2369
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -908,6 +912,11 @@ class DAGScheduler(
val task = event.task
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4155#discussion_r23700080
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/4251
SPARK-5458. Refer to aggregateByKey instead of combineByKey in docs
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-5458
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4251#issuecomment-71897120
Good point, updated the patch
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4228#issuecomment-71765593
@rxin I mean making the `reduce` action able to do a tree reduce
underneath. So all reduces are tree reduces, but the default number of
levels is 1.
---
If your
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4228#issuecomment-71768075
My thinking was just to simplify the API. I.e. if numLevels is 1, we could
branch to the old implementation. I don't have a strong opinion either way,
but was thinking
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4212#issuecomment-71740268
This looks reasonable. Was initially worried that changing the order might
mess with the effect of classloader instantiation on the thread pool but on
deeper inspection
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4228#issuecomment-71756156
I think this would be a great API to add. Have you weighed adding a
numLevels argument to `reduce` itself instead of a new method?
---
If your project is set up
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4051#issuecomment-71510560
What would be the advantage of limiting executor requests to the number of
NMs in the cluster?
---
If your project is set up for it, you can reply to this email and have
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4205#issuecomment-71496062
Hi @kdatta, mind giving this a descriptive title and the [GRAPHX] tag so it
can get sorted properly?
---
If your project is set up for it, you can reply to this email
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4205#issuecomment-71499617
Exactly
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4050#issuecomment-71537583
If we use a inputFormat that donât instanc of
org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}, then we
can't get information of input metrics
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4192#discussion_r23563321
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -60,6 +62,9 @@ private[yarn] class YarnAllocator(
import
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4192#discussion_r23564448
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -60,6 +62,9 @@ private[yarn] class YarnAllocator(
import
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4168#discussion_r23566205
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -199,14 +199,31 @@ private[spark] class ExecutorAllocationManager
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4192#discussion_r23591435
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -60,6 +62,9 @@ private[yarn] class YarnAllocator(
import
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4050#issuecomment-71563355
Edited the JIRA title and added tests for the CombineFileSplits. Tested
both against Hadoop 2.3 (which doesn't support getFSBytesReadCallback) and
Hadoop 2.5 (which does
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4050#discussion_r23563465
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -218,13 +219,14 @@ class HadoopRDD[K, V](
// Find a function
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4141#discussion_r23564546
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -192,15 +186,32 @@ private[yarn] class YarnAllocator
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4051#issuecomment-71406518
@sryza Is the point of not requiring these configs that the users don't
really know how many executors they actually want?
Exactly. From my perspective, one
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/4192
SPARK-5393. Flood of util.RackResolver log messages after SPARK-1714
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-5393
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/4168
SPARK-4136. Under dynamic allocation, cancel outstanding executor requests
when no longer needed [WIP]
This takes advantage of the changes made in SPARK-4337 to cancel pending
requests to YARN when
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4150#issuecomment-71055028
I think this is a duplicate of #4050, which only adds support for
`CombineFileSplit`s. We shouldn't add support for generic `InputSplit`s
because many input formats do
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/4164
SPARK-5370. [YARN] Remove some unnecessary synchronization in YarnAlloca...
...tor
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/4141
SPARK-4337. Add ability to cancel pending requests to YARN
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-4337
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4156#issuecomment-70976336
Hi @saucam, mind tagging this with [SQL] so it can get properly sorted?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4067#discussion_r23284264
--- Diff:
core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala
---
@@ -82,7 +82,15 @@ private[hash] object
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4123#issuecomment-70799955
We should warn about this in standalone mode and mesos coarse grained mode
as well, right?
---
If your project is set up for it, you can reply to this email and have your
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4067#discussion_r23284115
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
---
@@ -238,6 +245,10 @@ case class InputMetrics(readMethod:
DataReadMethod.Value
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4123#discussion_r23283721
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
@@ -19,7 +19,7 @@ package org.apache.spark.deploy.yarn
import
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4067#discussion_r23284251
--- Diff:
core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala
---
@@ -82,7 +82,15 @@ private[hash] object
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4067#discussion_r23284154
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
---
@@ -317,4 +338,9 @@ class ShuffleWriteMetrics extends Serializable
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4067#discussion_r23284162
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -41,7 +41,7 @@ import org.apache.spark._
import
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3765#discussion_r23263022
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -153,498 +154,241 @@ private[yarn] class YarnAllocator
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3765#issuecomment-70762709
@tgravescs, uploaded a new patch that addresses your review comments. I
just ran a bunch of manual tests on a 6-node, including
* request more resources than
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3765#discussion_r23261539
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -153,498 +154,241 @@ private[yarn] class YarnAllocator
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/3765#discussion_r23261733
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -153,498 +154,241 @@ private[yarn] class YarnAllocator
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4051#issuecomment-70613708
I updated the patch to add a `spark.dynamicAllocation.initialExecutors`
property. I also removed the requirement to set min/maxExecutors, so the user
now only needs
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4051#discussion_r23205045
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
@@ -73,12 +73,12 @@ private[spark] class ClientArguments(args:
Array
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4020#issuecomment-70607997
@pwendell sorry, was out for the weekend, but this LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3670#issuecomment-70227794
Ok here's a version that's ready for review. It still needs a little more
doc, polish, and test or two, but would like to get validation on the approach.
---
If your
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4051#issuecomment-70295188
To distill the motivation on the JIRA and make sure we're on the same page:
in most situations (including Hive-on-Spark), users don't or can't know how
many resources
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4020#issuecomment-70189173
Had a look over, and this mostly looks good, but it looks like there are
many places where the patch replaces assigning with incrementing. It would be
good to take
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4020#discussion_r23055589
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -257,8 +257,8 @@ private[spark] class Executor(
val serviceTime
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4020#discussion_r23057341
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
---
@@ -240,10 +284,18 @@ class ShuffleWriteMetrics extends Serializable
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4067#discussion_r23061158
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -17,6 +17,8 @@
package org.apache.spark
+import
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4020#discussion_r23057216
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -257,8 +257,8 @@ private[spark] class Executor(
val serviceTime
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/4050
SPARK-5199. Input metrics should show up for InputFormats that return Co...
...mbineFileSplits
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4050#discussion_r22972019
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -219,6 +220,9 @@ class HadoopRDD[K, V](
val bytesReadCallback
601 - 700 of 1255 matches
Mail list logo