Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5577#issuecomment-97134710
As it bumps numRecordsWritten both on key and on value, it looks like this
change would make us double count the number of records written.
---
If your project is set up
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5704#issuecomment-97237212
I sync'd up with @andrewor14 offline about this. To summarize my
understanding of his concern, the current patch has non-optimal behavior in the
following (common
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5704#issuecomment-97237326
I'll update the current patch to include this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5403#issuecomment-97151475
My opinion is that the main criteria for including this are:
* Is there an intention for and clear path to a state where this could be
used in production? I think it's
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5751#issuecomment-97173195
Minor comment. Otherwise LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5751#discussion_r29277660
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1387,6 +1387,11 @@ class SparkContext(config: SparkConf) extends
Logging
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4450#discussion_r29190908
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4450#discussion_r29192947
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala
---
@@ -0,0 +1,117 @@
+/*
+ * Licensed
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4450#discussion_r29191148
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
@@ -740,15 +723,29 @@ private[spark] class ExternalSorter[K, V, C
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4450#discussion_r29191682
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala
---
@@ -0,0 +1,254 @@
+/*
+ * Licensed
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4450#discussion_r29193245
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala
---
@@ -0,0 +1,117 @@
+/*
+ * Licensed
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5716#issuecomment-96741889
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5719#discussion_r29167101
--- Diff: docs/running-on-yarn.md ---
@@ -134,6 +134,13 @@ Most of the configs are the same for Spark on YARN as
for other deployment modes
/td
/tr
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5242#issuecomment-96744404
That's correct. Only for executor, but I updated the config naming after
your earlier comment so that we have room to add it for the AM in the future.
I just filed SPARK
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4450#discussion_r29171158
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
@@ -113,11 +114,21 @@ private[spark] class ExternalSorter[K, V, C
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4450#issuecomment-96844045
Updated patch rebases on master and incorporates @pwendell 's review
comments.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/5704
SPARK-6954. ExecutorAllocationManager can end up requesting a negative n...
...umber of executors
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5536#issuecomment-96334096
#5704 demonstrates what I outlined in my above comment, and should
supersede this PR.
---
If your project is set up for it, you can reply to this email and have your
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4450#discussion_r29112442
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4450#discussion_r29112697
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala
---
@@ -0,0 +1,117 @@
+/*
+ * Licensed
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5676#issuecomment-95822131
This looks like a duplicate of SPARK-6954 (PR #5536)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5403#issuecomment-95686597
I've seen a lot of experimental features go into Hadoop and then slowly
wilt from lack of use and maintenance.
My two cents are that we should only include
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5627#issuecomment-95259018
LGTM. One tiny nit is that `import org.apache.spark.executor._` could go
to `import org.apache.spark.executor.{DataReadMethod, ShuffleWriteMetrics}`
---
If your
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4450#discussion_r28892668
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
@@ -113,11 +114,21 @@ private[spark] class ExternalSorter[K, V, C
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/4450#discussion_r28892612
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala
---
@@ -0,0 +1,254 @@
+/*
+ * Licensed
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4450#issuecomment-95272806
Thanks for the review Patrick.
Regarding Kryo: I would be really really surprised if Kryo were to change
its serialization format in such a drastic way without
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5536#issuecomment-94915882
Thanks for putting together those graphs Cheolsoo. I think they're really
helpful for thinking about this.
The side effect that I'm worried about
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5608#issuecomment-94910837
@srowen I haven't had a chance to look at this in detail yet, but, with
SizeEstimator, we're purely trying to understand the size of the object in
memory - the disk
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5536#issuecomment-94945086
Thanks Andrew, I think you've nailed the part of the problem that I was
missing: when executors die or are killed, executorIds.size decreases. If
numPendingExecutors
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5620#issuecomment-95010096
This LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5617#discussion_r28842177
--- Diff:
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -61,7 +61,16 @@ private[deploy] class SparkSubmitArguments(args:
Seq
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5577#issuecomment-94563008
My patch for SPARK-4550 ( #4450 ) makes a similar change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5536#issuecomment-94595477
Mind if I take a look before you merge? I think I'm the last one to have
touched this code.
On Apr 20, 2015, at 4:40 PM, andrewor14 notificati...@github.com wrote
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5536#issuecomment-94610887
I've thought about this more and think I'm able to articulate why I think
the present fix is incorrect.
Aside from the weirdness of the idea that there can
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5536#issuecomment-94611689
More broadly:
I agree with you that the current code is really really difficult to reason
about. When I added the ability to cancel executor requests, I spent
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5536#issuecomment-93665373
It would be good to come up with a test that can reproduce the issue. I
believe that it is actually supposed to be acceptable for numExecutorsPending
to sit below 0
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5478#issuecomment-93778753
Regarding the performance issue, this can be solved with the YARN
distributed cache in the same way it works for the Spark assembly jar. If the
file is placed on HDFS
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93785353
@pwendell thought this would be preferable:
One thing we could do that would be more isolated is have a function in
SparkContext called estimateSizeOf(object: Any
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5500#issuecomment-93802687
@mengxr that makes sense to me, but did you mean to switch 1 and 2? I.e.
in this PR we should implement `OneHotEncoder`, which takes an input column
with category indices
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5536#issuecomment-93801808
Awesome, sounds good Cheolsoo.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5500#issuecomment-93841917
Updated the patch conform to the approach described above.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5478#issuecomment-93490885
That alternative solution makes sense to me. If it's not going to be added
to the classpath, it might make more sense to use a zip than a jar.
---
If your project is set
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5492#issuecomment-93490325
This seems to only get called from two places:
* In standalone mode when the driver fails to communicate with the master,
or the master sends a message to kill
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93487382
Again, one of the main uses is estimating the size of variables you're
considering broadcasting. Another is experimenting with different
representations - e.g. how much
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5368#issuecomment-93492480
An RDD that handles read exceptions with `Try` does seem useful to me -
I've seen a decent amount of boilerplate used to accomplish the same thing. Is
there a reason
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5492#issuecomment-93527717
Ok. LGTM. Could make sense for someone with Mesos expertise to look at it.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5242#issuecomment-93580862
@tgravescs the first will apply to the AM as well if there's not one for
the AM specifically set:
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5242#discussion_r28467525
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -231,6 +249,17 @@ private[yarn] class YarnAllocator
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5242#issuecomment-93257613
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5242#issuecomment-92959186
Updated patch still only supports executor node labels, but renames the
property so that adding driver/am node labels in the future won't look weird.
I also switched
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5506#issuecomment-92944389
LGTM as well
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5242#issuecomment-92947103
@srowen @tgravescs sorry for the delay here. Have been working on a new
patch that adds support or AM label expressions, but ran into some YARN API
issues that makes
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5242#discussion_r28383831
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -123,6 +123,7 @@ private[spark] class Client(
appContext.setQueue
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5242#issuecomment-93119630
It turns out that setting the label expression when creating the
`ApplicationSubmissionContext` sets it for the AM as well, which is
undesirable. Updated patch reverts
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5063#issuecomment-93040635
@andrewor14 this was a concern I had as well, and there's a long discussion
above. The reason for not using `spark.executor.cores` is that the property
introduced here
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5063#issuecomment-93058391
My understanding based on the discussion here is that
`spark.mesos.executor.cores` is the number of cores reserved by an executor
*not* for use in running tasks. So
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5478#issuecomment-93061403
IIUC, the motivation for this change is that the assembly jar distribution
mechanism doesn't work for some Java versions.
I agree with Andrew that, if at all
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5449#discussion_r28278175
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -94,6 +98,14 @@ private[yarn] class YarnAllocator(
// Additional
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/5500
SPARK-5888. [MLLIB]. Add OneHotEncoder as a Transformer
This patch adds a one hot encoder for categorical features. Planning to
add documentation and another test after getting feedback
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5477#issuecomment-92235067
This LGTM as well
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5449#discussion_r28218090
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -94,6 +98,14 @@ private[yarn] class YarnAllocator(
// Additional
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5449#discussion_r28217902
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -119,7 +131,27 @@ private[yarn] class YarnAllocator(
def
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5321#issuecomment-91030087
Thanks @piaozhexiu.
This seems reasonable to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4142#issuecomment-91031234
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4142#issuecomment-90232072
Is my understanding correct that the only change since the last version is
that we bundle all the files together instead of uploading them individually?
Can we
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5376#issuecomment-90254901
This seems reasonable and mostly straightforward to me. What was your
method for finding all the instances of this?
---
If your project is set up for it, you can reply
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4142#issuecomment-90243071
I don't think the whole method needs to be refactored right now, but
there's a 50-line contiguous addition to it. Can that go in its own method?
---
If your project
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/4142#issuecomment-90299451
I'm advocating for a sub-method `private def createHadoopConfArchive():
File` to house collecting and zipping the conf files.
This stuff obviously makes sense
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5363#discussion_r27778826
--- Diff: docs/building-spark.md ---
@@ -98,10 +98,10 @@ mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0
-DskipTests clean package
# Apache Hadoop 2.3.X
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5343#issuecomment-89343218
I see. Do Hive/Pig/Sqoop have active clients or are they just monitoring
progress in the same way the Spark yarn-cluster client is?
---
If your project is set up
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5343#issuecomment-89337784
I'm somewhat dubious about whether this is needed. It takes a single
command to kill a job with yarn application -kill. Also, the advantage of the
latter
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5351#issuecomment-89379598
Mind tagging this with MLLIB?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5297#issuecomment-89430618
LGTM as well
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5356#discussion_r27760819
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
@@ -143,7 +143,8 @@ class YarnClusterSuite extends FunSuite
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5297#discussion_r27758229
--- Diff: docs/running-on-yarn.md ---
@@ -196,6 +196,15 @@ Most of the configs are the same for Spark on YARN as
for other deployment modes
It should
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5356#discussion_r27758595
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
@@ -143,7 +143,8 @@ class YarnClusterSuite extends FunSuite
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5297#discussion_r27694636
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -609,6 +593,26 @@ private[spark] class Client(
throw new SparkException
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5338#issuecomment-89097896
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5297#discussion_r27694617
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -609,6 +593,26 @@ private[spark] class Client(
throw new SparkException
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5297#discussion_r27585630
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -66,6 +66,8 @@ private[spark] class Client(
private val
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5292#discussion_r27589374
--- Diff:
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
---
@@ -125,6 +125,7 @@ private[spark] class
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5311#discussion_r27590421
--- Diff:
core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala ---
@@ -28,10 +28,20 @@ import org.apache.spark.util.ManualClock
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5311#discussion_r27591507
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -173,32 +179,24 @@ private[spark] class ExecutorAllocationManager
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5311#discussion_r27611599
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -17,10 +17,12 @@
package org.apache.spark
+import
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5311#issuecomment-88631842
One last nit. Otherwise LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5306#discussion_r27619143
--- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ---
@@ -71,12 +75,22 @@ private[spark] class HeartbeatReceiver(sc:
SparkContext
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5292#discussion_r27539943
--- Diff:
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
---
@@ -125,6 +125,7 @@ private[spark] class
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5276#discussion_r27539637
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
@@ -113,9 +113,11 @@ private[spark] class OutputCommitCoordinator
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5294#issuecomment-88165688
@srowen I assume this would be for running assemblies that already include
the Hadoop classes. @Sephiroth-Lin do you mind going into detail about the
situations you need
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5297#issuecomment-88168832
If somebody has an existing script that runs spark-submit and then follows
it with an action that expects the app to have completed, this would break it,
right?
I
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5294#discussion_r27530621
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -809,7 +809,13 @@ object Client extends Logging
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5294#discussion_r27530532
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -809,7 +809,13 @@ object Client extends Logging
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/5294#discussion_r27530361
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -809,7 +809,13 @@ object Client extends Logging
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5258#issuecomment-87581081
Did you mean to submit this as a pull request?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5252#issuecomment-87722975
Mind tagging this with SQL so it gets sorted properly?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5259#issuecomment-87724903
Is this the same as the issue in SPARK-5360? @kayousterhout
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5272#issuecomment-8269
Mind tagging this with [MLLIB]? Also, it seems weird to have any logging
here - the equivalent in LinearRegression doesn't. I would just take the
println out
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5063#issuecomment-87791403
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5250#issuecomment-87793634
@rxin can you explain that a little more? I was envisioning an
`InputFormat` that wrapped other `InputFormat`s and ignored exceptions, not
something that would require
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5250#issuecomment-87791639
My opinion is that this is the kind of thing that should go into a custom
`InputFormat`.
---
If your project is set up for it, you can reply to this email and have your
301 - 400 of 1255 matches
Mail list logo